Shares Predictions with Advanced Time-series Models¶

For this challenge, we have a dataset made of 4 time-series with weekly frequency: Share_value, Share_volume, Price and DP. The objective is to analyze and forecast the Share_value and Share_volume for 8 brands on a weekly basis for the next year (52 weeks), and this prediction will be evaluated with the MAE.

For doing it, we will perform an in-depth analysis with an exploratory analysis of the data to understand it, test several advanced forecasting models and select the best one for doing the final predictions of 52 weeks.

Libraries¶

In [ ]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import pmdarima as pm
import xgboost as xgb

from pylab import rcParams

from statsmodels.tsa.stattools import adfuller
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.exponential_smoothing.ets import ETSModel
from statsmodels.tsa.api import VAR

from sklearn.metrics import mean_absolute_error
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestRegressor

Load dataset¶

Let's load the data and check its format:

In [2]:
data = pd.read_csv('competition_training.csv')
data.head()
Out[2]:
Brand Date_Monday Share_value Share_volume Price DP
0 Brand1 2021-12-12 18.982 16.945 17.531 87.082
1 Brand1 2021-12-19 19.421 17.927 17.386 88.364
2 Brand1 2021-12-26 17.302 16.728 17.426 91.269
3 Brand1 2022-01-02 15.881 15.174 17.439 92.315
4 Brand1 2022-01-09 16.775 14.653 17.772 83.587

Shape:

In [3]:
data.shape
Out[3]:
(896, 6)

Dates:

In [4]:
data['Date_Monday'].unique()
Out[4]:
array(['2021-12-12', '2021-12-19', '2021-12-26', '2022-01-02',
       '2022-01-09', '2022-01-16', '2022-01-23', '2022-01-30',
       '2022-02-06', '2022-02-13', '2022-02-20', '2022-02-27',
       '2022-03-06', '2022-03-13', '2022-03-20', '2022-03-27',
       '2022-04-03', '2022-04-10', '2022-04-17', '2022-04-24',
       '2022-05-01', '2022-05-08', '2022-05-15', '2022-05-22',
       '2022-05-29', '2022-06-05', '2022-06-12', '2022-06-19',
       '2022-06-26', '2022-07-03', '2022-07-10', '2022-07-17',
       '2022-07-24', '2022-07-31', '2022-08-07', '2022-08-14',
       '2022-08-21', '2022-08-28', '2022-09-04', '2022-09-11',
       '2022-09-18', '2022-09-25', '2022-10-02', '2022-10-09',
       '2022-10-16', '2022-10-23', '2022-10-30', '2022-11-06',
       '2022-11-13', '2022-11-20', '2022-11-27', '2022-12-04',
       '2022-12-11', '2022-12-18', '2022-12-25', '2023-01-01',
       '2023-01-08', '2023-01-15', '2023-01-22', '2023-01-29',
       '2023-02-05', '2023-02-12', '2023-02-19', '2023-02-26',
       '2023-03-05', '2023-03-12', '2023-03-19', '2023-03-26',
       '2023-04-02', '2023-04-09', '2023-04-16', '2023-04-23',
       '2023-04-30', '2023-05-07', '2023-05-14', '2023-05-21',
       '2023-05-28', '2023-06-04', '2023-06-11', '2023-06-18',
       '2023-06-25', '2023-07-02', '2023-07-09', '2023-07-16',
       '2023-07-23', '2023-07-30', '2023-08-06', '2023-08-13',
       '2023-08-20', '2023-08-27', '2023-09-03', '2023-09-10',
       '2023-09-17', '2023-09-24', '2023-10-01', '2023-10-08',
       '2023-10-15', '2023-10-22', '2023-10-29', '2023-11-05',
       '2023-11-12', '2023-11-19', '2023-11-26', '2023-12-03',
       '2023-12-10', '2023-12-17', '2023-12-24', '2023-12-31',
       '2024-01-07', '2024-01-14', '2024-01-21', '2024-01-28'],
      dtype=object)

We are going to merge all the brands together so each brand has its column, so the index can be the weekly date. We will use the name Date_Sunday as the dates are really sundays, not mondays (although this is not relevant and in the final forecast we will use the name Date_Monday).

In [5]:
# Convert 'Date_Monday' to datetime format and set it as the index
data = data.rename(columns={'Date_Monday': 'Date_Sunday'})
data["Date_Sunday"] = pd.to_datetime(data["Date_Sunday"])
data.set_index("Date_Sunday", inplace=True)

# Pivot the table to get each Brand's data as separate columns
df = data.pivot_table(index="Date_Sunday", columns="Brand")

# Flatten the multi-index columns and rename them properly
df.columns = [f"{col[0]}_{col[1].replace('Brand', '')}" for col in df.columns]

# Sort
df = df.sort_index()

df.shape
Out[5]:
(112, 32)

So the columns are:

In [6]:
df.columns
Out[6]:
Index(['DP_1', 'DP_2', 'DP_3', 'DP_4', 'DP_5', 'DP_6', 'DP_7', 'DP_8',
       'Price_1', 'Price_2', 'Price_3', 'Price_4', 'Price_5', 'Price_6',
       'Price_7', 'Price_8', 'Share_value_1', 'Share_value_2', 'Share_value_3',
       'Share_value_4', 'Share_value_5', 'Share_value_6', 'Share_value_7',
       'Share_value_8', 'Share_volume_1', 'Share_volume_2', 'Share_volume_3',
       'Share_volume_4', 'Share_volume_5', 'Share_volume_6', 'Share_volume_7',
       'Share_volume_8'],
      dtype='object')

Now we have the predictors DP and Price and the targets Share_value and Share_volume in columns. Let's add the week number, month and year as features.

In [7]:
# Create Week number and Year featuers
df['Week_number'] = df.index.to_series().apply(lambda x: int(x.strftime("%W")))
df['Month'] = df.index.month
df['Year'] = df.index.year
df = df.asfreq('W-SUN')

We have 112 weeks of data, of which the distribution of years is the following:

In [8]:
df['Year'].value_counts()
Out[8]:
Year
2023    53
2022    52
2024     4
2021     3
Name: count, dtype: int64

The dataset has slightly more than 2 years of data. As we have to forecast a whole year, this will be a challenge, so our work will take that into account.

Exploratory Data Analysis¶

In this section we are going to analyze the data so we can understand it before applying the forecasting models.

Seasonality¶

As our data spans a bit more than 2 years, we can check if it has seasonality (using, for example, Share_volume_1):

In [9]:
rcParams['figure.figsize'] = 6, 4
decomposed_wind = sm.tsa.seasonal_decompose(df["Share_volume_1"], period=52)
decomposed_wind.plot()
plt.show()
No description has been provided for this image

With seasonal_decompose it is not very clear if there is seasonality. Let's plot the data for one feature for the available 4 years to see it better:

In [10]:
df_2021 = df[df['Year'] == 2021]
df_2022 = df[df['Year'] == 2022]
df_2023 = df[df['Year'] == 2023]
df_2024 = df[df['Year'] == 2024]
In [11]:
# Plot a feature for a brand
feature = 'DP'
brand = "1"

plt.figure(figsize=(12, 4))
plt.plot(df_2021['Week_number'], df_2021[f"{feature}_{brand}"], label="2021")
plt.plot(df_2022['Week_number'], df_2022[f"{feature}_{brand}"], label="2022")
plt.plot(df_2023['Week_number'], df_2023[f"{feature}_{brand}"], label="2023")
plt.plot(df_2024['Week_number'], df_2024[f"{feature}_{brand}"], label="2024")
plt.xlabel("Date")
plt.ylabel(feature)
plt.legend()
plt.show()
No description has been provided for this image

We can see how there are clear signs of seasonality for all the targets and predictors (and for all brands). We will take advantage of this in the models.

Correlations¶

Now, how are the different features correlated? Let's plot their correlations in a heatmap for the 32 features:

In [12]:
# Create the heatmap
plt.figure(figsize=(18, 12))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm', fmt='.1f', linewidths=0.5)

# Display the plot
plt.title("Correlation Heatmap")
plt.show()
No description has been provided for this image

Although there are many features, we can take some key insights:

  • The share value of a brand is highly correlated with the share volume. This means that, with forecasting one of them, the other can be easily predicted. During the models, we will use this.
  • There is high correlation in the DP and Price features.
  • The targets for a brand are not necessarily correlated only with the predictors of that brand, so an individual analysis per brand is probably not the best idea.

Stationarity¶

Are the series stationary? Let's check it with adfuller:

In [14]:
stationarity_results = []

for col in df.columns:
    adf_result = adfuller(df[col])
    p_value = adf_result[1]
    if p_value < 0.05:
        stationarity_results.append([col, p_value, 'Stationary'])
    else:
        stationarity_results.append([col, p_value, 'Non-Stationary'])
    

stationarity_df = pd.DataFrame(stationarity_results, columns=['Series', 'p-value', 'Stationarity'])

print(stationarity_df)
            Series   p-value    Stationarity
0             DP_1  0.008691      Stationary
1             DP_2  0.000012      Stationary
2             DP_3  0.016800      Stationary
3             DP_4  0.000172      Stationary
4             DP_5  0.000222      Stationary
5             DP_6  0.014492      Stationary
6             DP_7  0.004929      Stationary
7             DP_8  0.001862      Stationary
8          Price_1  0.079085  Non-Stationary
9          Price_2  0.403474  Non-Stationary
10         Price_3  0.475552  Non-Stationary
11         Price_4  0.000292      Stationary
12         Price_5  0.482347  Non-Stationary
13         Price_6  0.222301  Non-Stationary
14         Price_7  0.299075  Non-Stationary
15         Price_8  0.146947  Non-Stationary
16   Share_value_1  0.000001      Stationary
17   Share_value_2  0.000003      Stationary
18   Share_value_3  0.000001      Stationary
19   Share_value_4  0.001027      Stationary
20   Share_value_5  0.000562      Stationary
21   Share_value_6  0.000318      Stationary
22   Share_value_7  0.001857      Stationary
23   Share_value_8  0.000263      Stationary
24  Share_volume_1  0.000252      Stationary
25  Share_volume_2  0.000338      Stationary
26  Share_volume_3  0.001991      Stationary
27  Share_volume_4  0.002571      Stationary
28  Share_volume_5  0.000459      Stationary
29  Share_volume_6  0.000035      Stationary
30  Share_volume_7  0.019241      Stationary
31  Share_volume_8  0.000661      Stationary
32     Week_number  0.041814      Stationary
33           Month  0.059621  Non-Stationary
34            Year  0.542540  Non-Stationary

7 of the Price variables are non-stationary (apart from the trivial Year), but not any target or DP predictor. However, we will let the models that need stationarity in the exogenous variables handle it naturally, as most of the models that we are going to use do not need to stationarize this predictors.

Autocorrelations¶

In the following plots, we can see the ACF and PACF plots for each brand for the 4 features. However, we will not select manually the ARIMA parameters using these plots, as it is costly and we want to look for the best model using auto_arima.

DP¶

In [16]:
fig, axes = plt.subplots(4, 2, figsize=(12, 8))

# Loop over the brands and plot ACF for each feature in a 4x2 grid
for i in range(1, 9):
    
    # Plot
    ax = axes[(i-1)//2, (i-1)%2]
    plot_acf(df[f'DP_{i}'], ax=ax, lags=40)
    ax.set_title(f'ACF of DP_{i}')

plt.tight_layout()
plt.show()
No description has been provided for this image
In [17]:
fig, axes = plt.subplots(4, 2, figsize=(12, 8))

for i in range(1, 9):
    
    ax = axes[(i-1)//2, (i-1)%2]
    plot_pacf(df[f'DP_{i}'], ax=ax, lags=40)
    ax.set_title(f'PACF of DP_{i}')

plt.tight_layout()
plt.show()
No description has been provided for this image

Price¶

In [18]:
fig, axes = plt.subplots(4, 2, figsize=(12, 8))

for i in range(1, 9):
    
    ax = axes[(i-1)//2, (i-1)%2]
    plot_acf(df[f'Price_{i}'], ax=ax, lags=40)
    ax.set_title(f'ACF of Price_{i}')

plt.tight_layout()
plt.show()
No description has been provided for this image
In [19]:
fig, axes = plt.subplots(4, 2, figsize=(12, 8))

for i in range(1, 9):
    
    ax = axes[(i-1)//2, (i-1)%2]
    plot_pacf(df[f'Price_{i}'], ax=ax, lags=40)
    ax.set_title(f'PACF of Price_{i}')

plt.tight_layout()
plt.show()
No description has been provided for this image

Share Value¶

In [20]:
fig, axes = plt.subplots(4, 2, figsize=(12, 8))

for i in range(1, 9):
    
    ax = axes[(i-1)//2, (i-1)%2]
    plot_acf(df[f'Share_value_{i}'], ax=ax, lags=40)
    ax.set_title(f'ACF of Share_value_{i}')

plt.tight_layout()
plt.show()
No description has been provided for this image
In [21]:
fig, axes = plt.subplots(4, 2, figsize=(12, 8))

for i in range(1, 9):
    
    ax = axes[(i-1)//2, (i-1)%2]
    plot_pacf(df[f'Share_value_{i}'], ax=ax, lags=40)
    ax.set_title(f'PACF of Share_value_{i}')

plt.tight_layout()
plt.show()
No description has been provided for this image

Share Volume¶

In [22]:
fig, axes = plt.subplots(4, 2, figsize=(12, 8))

for i in range(1, 9):
    
    ax = axes[(i-1)//2, (i-1)%2]
    plot_acf(df[f'Share_volume_{i}'], ax=ax, lags=40)
    ax.set_title(f'ACF of Share_volume_{i}')

plt.tight_layout()
plt.show()
No description has been provided for this image
In [23]:
fig, axes = plt.subplots(4, 2, figsize=(12, 8))

for i in range(1, 9):
    
    ax = axes[(i-1)//2, (i-1)%2]
    plot_pacf(df[f'Share_volume_{i}'], ax=ax, lags=40)
    ax.set_title(f'PACF of Share_volume_{i}')

plt.tight_layout()
plt.show()
No description has been provided for this image

Feature Engineering¶

For the last part of the exploratory data analysis, we are going to add several features that can be helpful for the following models. These are the current predictors and targets:

In [24]:
df.columns
Out[24]:
Index(['DP_1', 'DP_2', 'DP_3', 'DP_4', 'DP_5', 'DP_6', 'DP_7', 'DP_8',
       'Price_1', 'Price_2', 'Price_3', 'Price_4', 'Price_5', 'Price_6',
       'Price_7', 'Price_8', 'Share_value_1', 'Share_value_2', 'Share_value_3',
       'Share_value_4', 'Share_value_5', 'Share_value_6', 'Share_value_7',
       'Share_value_8', 'Share_volume_1', 'Share_volume_2', 'Share_volume_3',
       'Share_volume_4', 'Share_volume_5', 'Share_volume_6', 'Share_volume_7',
       'Share_volume_8', 'Week_number', 'Month', 'Year'],
      dtype='object')

There is a clear sign of seasonality, so we are going to exploit it. As our final objective is to forecast 52 weeks, we are not going to depend on small lags (like 1 or 5), as the error can get very big with each forecasted week. Instead, we are going to use the value from last year, so we can do more solid long-term forecasting.

In [25]:
# Create lag features for Share_value and Share_volume (1 year ago)
for i in range(1, 9):  # Loop over the 8 brands
    df[f'Share_value_{i}_ly'] = df[f'Share_value_{i}'].shift(52)
    df[f'Share_volume_{i}_ly'] = df[f'Share_volume_{i}'].shift(52)

df.columns
Out[25]:
Index(['DP_1', 'DP_2', 'DP_3', 'DP_4', 'DP_5', 'DP_6', 'DP_7', 'DP_8',
       'Price_1', 'Price_2', 'Price_3', 'Price_4', 'Price_5', 'Price_6',
       'Price_7', 'Price_8', 'Share_value_1', 'Share_value_2', 'Share_value_3',
       'Share_value_4', 'Share_value_5', 'Share_value_6', 'Share_value_7',
       'Share_value_8', 'Share_volume_1', 'Share_volume_2', 'Share_volume_3',
       'Share_volume_4', 'Share_volume_5', 'Share_volume_6', 'Share_volume_7',
       'Share_volume_8', 'Week_number', 'Month', 'Year', 'Share_value_1_ly',
       'Share_volume_1_ly', 'Share_value_2_ly', 'Share_volume_2_ly',
       'Share_value_3_ly', 'Share_volume_3_ly', 'Share_value_4_ly',
       'Share_volume_4_ly', 'Share_value_5_ly', 'Share_volume_5_ly',
       'Share_value_6_ly', 'Share_volume_6_ly', 'Share_value_7_ly',
       'Share_volume_7_ly', 'Share_value_8_ly', 'Share_volume_8_ly'],
      dtype='object')

Performance Evaluation¶

Before starting developing the models, we need a way to compare between them. As we have 112 weeks of data and we want to exploit the seasonality, we need at least two full years for our training set (this is, 104 weeks). This lets us with just 8 weeks for the test set. However, as we said, we are not using small lags but seasonality, so we do not need a test set similar to 52 weeks to do a good assesment of the performance. If we used small lags, errors for 8 forecasted weeks could be small but they could increase a lot when forecasting 52 weeks. However, our models will take profit of the seasonality, so 8 weeks, although small, is a valid evaluator for the performance.

In [26]:
targets = ['Share_value', 'Share_volume']
results_df = pd.DataFrame()
In [ ]:
L = 104  # Lenght of training set (2 years)
train = df.iloc[:L]   # Rows 1 to 60 for training
test = df.iloc[L:112]  # Rows 61 to 112 for testing

print(len(train), len(test))
104 8

We are going to drop the predictors from the test set, as we will not have the data in the real case.

In [28]:
dp_price_columns = [f'DP_{i}' for i in range(1, 9)] + [f'Price_{i}' for i in range(1, 9)]
dp_price_true_df = test[[f'DP_{i}' for i in range(1, 9)] + [f'Price_{i}' for i in range(1, 9)]]
test = test.drop(columns=dp_price_columns, errors='ignore')
test.columns
Out[28]:
Index(['Share_value_1', 'Share_value_2', 'Share_value_3', 'Share_value_4',
       'Share_value_5', 'Share_value_6', 'Share_value_7', 'Share_value_8',
       'Share_volume_1', 'Share_volume_2', 'Share_volume_3', 'Share_volume_4',
       'Share_volume_5', 'Share_volume_6', 'Share_volume_7', 'Share_volume_8',
       'Week_number', 'Month', 'Year', 'Share_value_1_ly', 'Share_volume_1_ly',
       'Share_value_2_ly', 'Share_volume_2_ly', 'Share_value_3_ly',
       'Share_volume_3_ly', 'Share_value_4_ly', 'Share_volume_4_ly',
       'Share_value_5_ly', 'Share_volume_5_ly', 'Share_value_6_ly',
       'Share_volume_6_ly', 'Share_value_7_ly', 'Share_volume_7_ly',
       'Share_value_8_ly', 'Share_volume_8_ly'],
      dtype='object')

Here we can see an example of the split done:

In [29]:
feature = 'Share_value'
brand = "1"

plt.figure(figsize=(12, 4))
plt.plot(train.index, train[f"{feature}_{brand}"], label="Train")
plt.plot(test.index, test[f"{feature}_{brand}"], label="Test")
plt.xlabel("Date")
plt.ylabel(feature)
plt.legend()
plt.show()
No description has been provided for this image

Naive Models¶

The first models that we are going to develop are the "naive" models. They will serve as a baseline for comparing with the rest of them.

Naive Median¶

The first model will use the median for the forecast (as it minimizes the MAE, the performance metric). Here we can see an example:

In [31]:
feature = 'Share_value'
brand = "1"

median_value = train[f"{feature}_{brand}"].median()
forecast = [median_value] * len(test)

plt.figure(figsize=(12, 4))
plt.plot(train.index, train[f"{feature}_{brand}"], label="Train")
plt.plot(test.index, test[f"{feature}_{brand}"], label="Test")
plt.plot(test.index, forecast, label="Forecast")
plt.xlabel("Date")
plt.ylabel(f"{feature}_{brand}")
plt.legend()
plt.show()
No description has been provided for this image

And here we compute it for all targets and calculate and store the resulting MAE:

In [32]:
maes = []
for feature in targets:
    for brand in range(1, 8+1):
        median_value = train[f"{feature}_{brand}"].median()
        forecast = [median_value] * len(test)
        maes.append(mean_absolute_error(test[f"{feature}_{brand}"], forecast))
        
mae = sum(maes) / len(maes)
print(f'MAE = {mae}')
results_df.loc['Naive Median', 'MAE'] = mae
MAE = 1.5345234374999999

Naive Seasonal¶

This naive model will use the value from the last year of each target for the forecast:

In [33]:
for feature in targets:
    for brand in range(1, 8+1):

        # Use the value from the same week last year
        forecast = train[f"{feature}_{brand}"].shift(52).iloc[-len(test):].values
        
        # Compute MAE
        maes.append(mean_absolute_error(test[f"{feature}_{brand}"], forecast))

mae = sum(maes) / len(maes)
print(f'MAE = {mae}')
results_df.loc['Naive Seasonal', 'MAE'] = mae
MAE = 1.7515273437499999

Now, our objective is to get a lower MAE from our forecasting models.

Automatic Models¶

These are the "simplest" models that we are going to use, as they do not need any predictors for the target (just the past target values). Also, they allow us to visualize their confidence intervals, which is very useful.

ETS¶

As we said, we want to take advantage of the seasonality, so ETS is promising for our data. For each model, we will first do a test for one target so we can see how it works, Then, we will do the forecast for all 16 targets as also visualize the resulting forecast, comparing them to the real test values.

Testing for one target¶

In [ ]:
feature = 'Share_value'
brand = "1"

model_ets = ETSModel(
    train[f"{feature}_{brand}"],
    trend='add',
    seasonal='add',
    seasonal_periods=52
).fit()
In [35]:
pred = model_ets.get_prediction(start =  test.index[0], end = test.index[-1])
cis = pred.pred_int(alpha = .05) #confidence interval

lower_ci = cis['lower PI (alpha=0.050000)']
upper_ci = cis['upper PI (alpha=0.050000)']
In [36]:
forecast = model_ets.forecast(steps=len(test))

plt.figure(figsize=(12, 4))
plt.plot(train.index, train[f"{feature}_{brand}"], label="Train")
plt.plot(test.index, test[f"{feature}_{brand}"], label="Test")
plt.plot(test.index, forecast, label="Forecast")
plt.fill_between(test.index, lower_ci, upper_ci, color='red', alpha=0.2, label='95% Confidence Interval')
plt.xlabel("Date")
plt.ylabel(f"{feature}_{brand}")
plt.legend()
plt.grid()
plt.show()
No description has been provided for this image

Forecasting all targets¶

In [ ]:
maes = []
maes_ets = {}
forecasts_value = {}
forecasts_value_lci = {}
forecasts_value_uci = {}
forecasts_volume = {}
forecasts_volume_lci = {}
forecasts_volume_uci = {}
forecasts_ets = {}

for feature in targets:
    for brand in range(1, 8+1):
        model_ets = ETSModel(
            train[f"{feature}_{brand}"],
            trend='add',
            seasonal='add',
            seasonal_periods=52
        ).fit()
        pred = model_ets.get_prediction(start =  test.index[0], end = test.index[-1])
        cis = pred.pred_int(alpha = .05) #confidence interval
        lower_ci = cis['lower PI (alpha=0.050000)']
        upper_ci = cis['upper PI (alpha=0.050000)']
        forecast = model_ets.forecast(steps=len(test))
        forecasts_ets[f"{feature}_{brand}"] = forecast
        if feature == 'Share_value':
            forecasts_value[f"{feature}_{brand}"] = forecast
            forecasts_value_lci[f"{feature}_{brand}"] = lower_ci
            forecasts_value_uci[f"{feature}_{brand}"] = upper_ci
        else:
            forecasts_volume[f"{feature}_{brand}"] = forecast
            forecasts_volume_lci[f"{feature}_{brand}"] = lower_ci
            forecasts_volume_uci[f"{feature}_{brand}"] = upper_ci
        maes.append(mean_absolute_error(test[f"{feature}_{brand}"], forecast))
        maes_ets[f"{feature}_{brand}"] = mean_absolute_error(test[f"{feature}_{brand}"], forecast)
In [38]:
mae = sum(maes) / len(maes)
print(f'MAE = {mae:.3f}')
results_df.loc['ETS', 'MAE'] = mae
MAE = 0.709

Predictions for Share_value:

In [39]:
# Create a 4x2 subplot for Share_value forecasts
fig, axs = plt.subplots(2, 4, figsize=(16, 6))
axs = axs.flatten()  # Flatten the array of axes for easier indexing

# Loop through each brand and plot the forecasts
for brand in range(1, 9):  # Brands are 1 to 8
    # Define the feature and the key for the forecast
    feature = "Share_value"
    forecast_key = f"{feature}_{brand}"

    # Get the forecast for the current brand
    forecast = forecasts_value[forecast_key]
    lower_ci = forecasts_value_lci[forecast_key]
    upper_ci = forecasts_value_uci[forecast_key]

    # Get the corresponding train and test data
    train_data = train[f"{feature}_{brand}"]
    test_data = test[f"{feature}_{brand}"]

    # Select the appropriate axis for this brand
    ax = axs[brand - 1]

    # Plot the train, test, and forecast values
    ax.plot(train_data.index, train_data, label="Train", color='blue')
    ax.plot(test_data.index, test_data, label="Test", color='green')
    ax.plot(test_data.index, forecast, label="Forecast", color='red')
    ax.fill_between(test.index, lower_ci, upper_ci, color='red', alpha=0.2, label='95% Confidence Interval')

    # Set labels and title
    ax.set_xlabel("Date")
    ax.set_ylabel(f"{feature}_{brand}")
    #ax.set_title(f"Brand {brand}")
    ax.legend()

    # Set x-axis limits
    ax.set_xlim(pd.to_datetime(['2023-10-22', '2024-01-28']))

# Adjust layout to prevent overlapping labels
plt.tight_layout()
plt.show()
No description has been provided for this image

Predictions for Share_volume:

In [40]:
# Create a 4x2 subplot for Share_volume forecasts
fig, axs = plt.subplots(2, 4, figsize=(16, 6))
axs = axs.flatten()  # Flatten the array of axes for easier indexing

# Loop through each brand and plot the forecasts
for brand in range(1, 9):  # Brands are 1 to 8
    # Define the feature and the key for the forecast
    feature = "Share_volume"
    forecast_key = f"{feature}_{brand}"

    # Get the forecast for the current brand
    forecast = forecasts_volume[forecast_key]
    lower_ci = forecasts_volume_lci[forecast_key]
    upper_ci = forecasts_volume_uci[forecast_key]

    # Get the corresponding train and test data
    train_data = train[f"{feature}_{brand}"]
    test_data = test[f"{feature}_{brand}"]

    # Select the appropriate axis for this brand
    ax = axs[brand - 1]

    # Plot the train, test, and forecast values
    ax.plot(train_data.index, train_data, label="Train", color='blue')
    ax.plot(test_data.index, test_data, label="Test", color='green')
    ax.plot(test_data.index, forecast, label="Forecast", color='red')
    ax.fill_between(test.index, lower_ci, upper_ci, color='red', alpha=0.2, label='95% Confidence Interval')

    # Set labels and title
    ax.set_xlabel("Date")
    ax.set_ylabel(f"{feature}_{brand}")
    #ax.set_title(f"Brand {brand}")
    ax.legend()

    # Set x-axis limits
    ax.set_xlim(pd.to_datetime(['2023-10-22', '2024-01-28']))

# Adjust layout to prevent overlapping labels
plt.tight_layout()
plt.show()
No description has been provided for this image

We got a MAE of approx. 0.7, which is a great improvement from the naive models. This is a very simple model that only uses the past target values, and we can see how its forecast work very well for many targets (and also shows the confidence intervals). Remember that ETS has an important season component, so although our test size is small compared to the final forecast, its performance probably will not decay significantly with the 52 weeks.

SARIMA¶

Apart from ETS, we are also going to use an ARIMA model. As we said, we want to use seasonality so we will use SARIMA. For selecting the parameters, we will run auto_arima for each of the targets so we can select the best for them, because targets natures seem to be different between them.

Testing for one target¶

In [41]:
feature = 'Share_value'
brand = "1"

model_arima = pm.auto_arima(
    train[f"{feature}_{brand}"],
    seasonal=True,
    m=52,
    D=0,
    trace=True,
    error_action='ignore',
    suppress_warnings=True,
    stepwise=True
)
Performing stepwise search to minimize aic
 ARIMA(2,0,2)(1,0,1)[52] intercept   : AIC=321.856, Time=5.13 sec
 ARIMA(0,0,0)(0,0,0)[52] intercept   : AIC=353.292, Time=0.01 sec
 ARIMA(1,0,0)(1,0,0)[52] intercept   : AIC=317.351, Time=1.84 sec
 ARIMA(0,0,1)(0,0,1)[52] intercept   : AIC=321.811, Time=0.62 sec
 ARIMA(0,0,0)(0,0,0)[52]             : AIC=869.689, Time=0.01 sec
 ARIMA(1,0,0)(0,0,0)[52] intercept   : AIC=315.421, Time=0.04 sec
 ARIMA(1,0,0)(0,0,1)[52] intercept   : AIC=317.361, Time=2.91 sec
 ARIMA(1,0,0)(1,0,1)[52] intercept   : AIC=inf, Time=2.80 sec
 ARIMA(2,0,0)(0,0,0)[52] intercept   : AIC=314.883, Time=0.05 sec
 ARIMA(2,0,0)(1,0,0)[52] intercept   : AIC=315.948, Time=3.28 sec
 ARIMA(2,0,0)(0,0,1)[52] intercept   : AIC=316.025, Time=1.86 sec
 ARIMA(2,0,0)(1,0,1)[52] intercept   : AIC=317.970, Time=1.60 sec
 ARIMA(3,0,0)(0,0,0)[52] intercept   : AIC=316.883, Time=0.04 sec
 ARIMA(2,0,1)(0,0,0)[52] intercept   : AIC=316.884, Time=0.05 sec
 ARIMA(1,0,1)(0,0,0)[52] intercept   : AIC=315.176, Time=0.03 sec
 ARIMA(3,0,1)(0,0,0)[52] intercept   : AIC=318.881, Time=0.04 sec
 ARIMA(2,0,0)(0,0,0)[52]             : AIC=inf, Time=0.01 sec

Best model:  ARIMA(2,0,0)(0,0,0)[52] intercept
Total fit time: 20.337 seconds
In [42]:
forecast, conf_int = model_arima.predict(8, return_conf_int=True, alpha=0.05)
lower_ci = conf_int[:,0]
upper_ci = conf_int[:,1]

plt.figure(figsize=(12, 4))
plt.plot(train.index, train[f"{feature}_{brand}"], label="Train")
plt.plot(test.index, test[f"{feature}_{brand}"], label="Test")
plt.plot(test.index, forecast, label="Forecast")
plt.fill_between(test.index, lower_ci, upper_ci, color='red', alpha=0.2, label='95% Confidence Interval')
plt.xlabel("Date")
plt.ylabel(f"{feature}_{brand}")
plt.legend()
plt.grid()
plt.show()
No description has been provided for this image

Forecasting all targets¶

In [43]:
maes = []
forecasts_value = {}
forecasts_value_lci = {}
forecasts_value_uci = {}
forecasts_volume = {}
forecasts_volume_lci = {}
forecasts_volume_uci = {}

for feature in targets:
    for brand in range(1, 8+1):
        model_arima = pm.auto_arima(
            train[f"{feature}_{brand}"],
            seasonal=True,
            m=52,
            D=0,
            trace=False,
            error_action='ignore',
            suppress_warnings=True,
            stepwise=True
        )
        forecast, conf_int = model_arima.predict(8, return_conf_int=True, alpha=0.05)
        lower_ci = conf_int[:,0]
        upper_ci = conf_int[:,1]
        if feature == 'Share_value':
            forecasts_value[f"{feature}_{brand}"] = forecast
            forecasts_value_lci[f"{feature}_{brand}"] = lower_ci
            forecasts_value_uci[f"{feature}_{brand}"] = upper_ci
        else:
            forecasts_volume[f"{feature}_{brand}"] = forecast
            forecasts_volume_lci[f"{feature}_{brand}"] = lower_ci
            forecasts_volume_uci[f"{feature}_{brand}"] = upper_ci
        maes.append(mean_absolute_error(test[f"{feature}_{brand}"], forecast))
In [44]:
mae = sum(maes) / len(maes)
print(f'MAE = {mae:.3f}')
results_df.loc['SARIMA', 'MAE'] = mae
MAE = 1.085
In [45]:
# Create a 4x2 subplot for Share_value forecasts
fig, axs = plt.subplots(2, 4, figsize=(16, 6))
axs = axs.flatten()  # Flatten the array of axes for easier indexing

# Loop through each brand and plot the forecasts
for brand in range(1, 9):  # Brands are 1 to 8
    # Define the feature and the key for the forecast
    feature = "Share_value"
    forecast_key = f"{feature}_{brand}"

    # Get the forecast for the current brand
    forecast = forecasts_value[forecast_key]
    lower_ci = forecasts_value_lci[forecast_key]
    upper_ci = forecasts_value_uci[forecast_key]

    # Get the corresponding train and test data
    train_data = train[f"{feature}_{brand}"]
    test_data = test[f"{feature}_{brand}"]

    # Select the appropriate axis for this brand
    ax = axs[brand - 1]

    # Plot the train, test, and forecast values
    ax.plot(train_data.index, train_data, label="Train", color='blue')
    ax.plot(test_data.index, test_data, label="Test", color='green')
    ax.plot(test_data.index, forecast, label="Forecast", color='red')
    ax.fill_between(test.index, lower_ci, upper_ci, color='red', alpha=0.2, label='95% Confidence Interval')

    # Set labels and title
    ax.set_xlabel("Date")
    ax.set_ylabel(f"{feature}_{brand}")
    #ax.set_title(f"Brand {brand}")
    ax.legend()

    # Set x-axis limits
    ax.set_xlim(pd.to_datetime(['2023-10-22', '2024-01-28']))

# Adjust layout to prevent overlapping labels
plt.tight_layout()
plt.show()
No description has been provided for this image
In [46]:
# Create a 4x2 subplot for Share_volume forecasts
fig, axs = plt.subplots(2, 4, figsize=(16, 6))
axs = axs.flatten()  # Flatten the array of axes for easier indexing

# Loop through each brand and plot the forecasts
for brand in range(1, 9):  # Brands are 1 to 8
    # Define the feature and the key for the forecast
    feature = "Share_volume"
    forecast_key = f"{feature}_{brand}"

    # Get the forecast for the current brand
    forecast = forecasts_volume[forecast_key]
    lower_ci = forecasts_volume_lci[forecast_key]
    upper_ci = forecasts_volume_uci[forecast_key]

    # Get the corresponding train and test data
    train_data = train[f"{feature}_{brand}"]
    test_data = test[f"{feature}_{brand}"]

    # Select the appropriate axis for this brand
    ax = axs[brand - 1]

    # Plot the train, test, and forecast values
    ax.plot(train_data.index, train_data, label="Train", color='blue')
    ax.plot(test_data.index, test_data, label="Test", color='green')
    ax.plot(test_data.index, forecast, label="Forecast", color='red')
    ax.fill_between(test.index, lower_ci, upper_ci, color='red', alpha=0.2, label='95% Confidence Interval')

    # Set labels and title
    ax.set_xlabel("Date")
    ax.set_ylabel(f"{feature}_{brand}")
    #ax.set_title(f"Brand {brand}")
    ax.legend()

    # Set x-axis limits
    ax.set_xlim(pd.to_datetime(['2023-10-22', '2024-01-28']))

# Adjust layout to prevent overlapping labels
plt.tight_layout()
plt.show()
No description has been provided for this image

Although SARIMA works well for some targets, it is not capable of predicting very well others, so we can clearly see that ETS is the best automatic model. SARIMA also needed much more computation time.

Adding Exogenous Variables¶

We have used models that only use the past values for each target. However, in our dataset we have the DP and Price predictors, which show correlation with the targets and can improve the results.

Predictors analysis¶

Let's first do an analysis of the predictors.

Correlations¶

Let's do a quick and final analysis of the correlations between the features:

Correlation with targets¶

In [131]:
# Create the heatmap
plt.figure(figsize=(18, 12))
sns.heatmap(train.corr(), annot=False, cmap='coolwarm', fmt='.1f', linewidths=0.5)

# Display the plot
plt.title("Correlation Heatmap")
plt.show()
No description has been provided for this image

Although this is a quite difficult to understand heatmap, we can extract a key insight: our predictors (DP, Price, values from last year and time features) show high correlations with many targets, and this correlations are usually between brands, so we need to find which predictors are significant for each one. We will do this in this section with the dynamic regression model.

Lagged Variables¶

As a final check, let's see if there is high correlation for the lags of the predictors. For this, we will compute the lags 1, 2 and 4, and plot them in a 4 different heatmaps (compared to the lag 0 values). Our objective is to see if some values of the correlations increase with the lags.

In [48]:
lags = [1, 2, 4]
df_lag = df.copy()
for lag in lags:
    for i in range(1, 9):  # 8 brands
        df_lag[f'DP_{i}_lag{lag}'] = df_lag[f'DP_{i}'].shift(lag)
        df_lag[f'Price_{i}_lag{lag}'] = df_lag[f'Price_{i}'].shift(lag)
In [49]:
# Lag 0
plt.figure(figsize=(10, 6))
# Select columns for the current lag
cols = [f'DP_{i}' for i in range(1, 9)] + \
       [f'Price_{i}' for i in range(1, 9)] + \
       [f'Share_value_{i}' for i in range(1, 9)] + \
       [f'Share_volume_{i}' for i in range(1, 9)]
df_subset = df_lag[cols]
# Compute correlation
corr_matrix = df_subset.corr()
# Plot heatmap
sns.heatmap(corr_matrix, annot=False, fmt=".1f", cmap="coolwarm", linewidths=0.5)
plt.title(f'Correlation Heatmap for Lag 0')
plt.show()

# Plot correlation heatmap for each lag
for lag in lags:

    plt.figure(figsize=(10, 6))
    
    # Select columns for the current lag
    cols = [f'DP_{i}_lag{lag}' for i in range(1, 9)] + \
           [f'Price_{i}_lag{lag}' for i in range(1, 9)] + \
           [f'Share_value_{i}' for i in range(1, 9)] + \
           [f'Share_volume_{i}' for i in range(1, 9)]
    
    df_subset = df_lag[cols]
    
    # Compute correlation
    corr_matrix = df_subset.corr()
    
    # Plot heatmap
    sns.heatmap(corr_matrix, annot=False, fmt=".1f", cmap="coolwarm", linewidths=0.5)
    plt.title(f'Correlation Heatmap for Lag {lag}')
    plt.show()
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

As we can see, the bigger the lags the less important is the correlation. This is not relevant for us because, as we said, we are taking profit of the seasonality and not these small lags, so our seasonal models are probably the best options.

Forecasting Predictors¶

As we are going to use the predictors for forecasting the targets, we need to forecast these. The most effective automatic method was ETS, so we are going to use it to forecast the DP and Price features. These means that our final exogenous variables are going to be:

  • Forecasted DP and Price features
  • Week_number, Month, Year
  • Value last year

Now, forecast DP and Price:

In [51]:
train.columns
Out[51]:
Index(['DP_1', 'DP_2', 'DP_3', 'DP_4', 'DP_5', 'DP_6', 'DP_7', 'DP_8',
       'Price_1', 'Price_2', 'Price_3', 'Price_4', 'Price_5', 'Price_6',
       'Price_7', 'Price_8', 'Share_value_1', 'Share_value_2', 'Share_value_3',
       'Share_value_4', 'Share_value_5', 'Share_value_6', 'Share_value_7',
       'Share_value_8', 'Share_volume_1', 'Share_volume_2', 'Share_volume_3',
       'Share_volume_4', 'Share_volume_5', 'Share_volume_6', 'Share_volume_7',
       'Share_volume_8', 'Week_number', 'Month', 'Year', 'Share_value_1_ly',
       'Share_volume_1_ly', 'Share_value_2_ly', 'Share_volume_2_ly',
       'Share_value_3_ly', 'Share_volume_3_ly', 'Share_value_4_ly',
       'Share_volume_4_ly', 'Share_value_5_ly', 'Share_volume_5_ly',
       'Share_value_6_ly', 'Share_volume_6_ly', 'Share_value_7_ly',
       'Share_volume_7_ly', 'Share_value_8_ly', 'Share_volume_8_ly'],
      dtype='object')
In [52]:
test.columns
Out[52]:
Index(['Share_value_1', 'Share_value_2', 'Share_value_3', 'Share_value_4',
       'Share_value_5', 'Share_value_6', 'Share_value_7', 'Share_value_8',
       'Share_volume_1', 'Share_volume_2', 'Share_volume_3', 'Share_volume_4',
       'Share_volume_5', 'Share_volume_6', 'Share_volume_7', 'Share_volume_8',
       'Week_number', 'Month', 'Year', 'Share_value_1_ly', 'Share_volume_1_ly',
       'Share_value_2_ly', 'Share_volume_2_ly', 'Share_value_3_ly',
       'Share_volume_3_ly', 'Share_value_4_ly', 'Share_volume_4_ly',
       'Share_value_5_ly', 'Share_volume_5_ly', 'Share_value_6_ly',
       'Share_volume_6_ly', 'Share_value_7_ly', 'Share_volume_7_ly',
       'Share_value_8_ly', 'Share_volume_8_ly'],
      dtype='object')
In [ ]:
maes = []
forecasts_dp = {}
forecasts_dp_lci = {}
forecasts_dp_uci = {}
forecasts_price = {}
forecasts_price_lci = {}
forecasts_price_uci = {}

features = ['DP', 'Price']

for feature in features:
    for brand in range(1, 8+1):
        model_ets = ETSModel(
            train[f"{feature}_{brand}"],
            trend='add',
            seasonal='add',
            seasonal_periods=52
        ).fit()
        pred = model_ets.get_prediction(start =  test.index[0], end = test.index[-1])
        cis = pred.pred_int(alpha = .05) #confidence interval
        lower_ci = cis['lower PI (alpha=0.050000)']
        upper_ci = cis['upper PI (alpha=0.050000)']
        forecast = model_ets.forecast(steps=len(test))
        test.loc[:, f"{feature}_{brand}"] = forecast
        if feature == 'DP':
            forecasts_dp[f"{feature}_{brand}"] = forecast
            forecasts_dp_lci[f"{feature}_{brand}"] = lower_ci
            forecasts_dp_uci[f"{feature}_{brand}"] = upper_ci
        else:
            forecasts_price[f"{feature}_{brand}"] = forecast
            forecasts_price_lci[f"{feature}_{brand}"] = lower_ci
            forecasts_price_uci[f"{feature}_{brand}"] = upper_ci
        maes.append(mean_absolute_error(dp_price_true_df[f"{feature}_{brand}"], forecast))
In [54]:
test.columns
Out[54]:
Index(['Share_value_1', 'Share_value_2', 'Share_value_3', 'Share_value_4',
       'Share_value_5', 'Share_value_6', 'Share_value_7', 'Share_value_8',
       'Share_volume_1', 'Share_volume_2', 'Share_volume_3', 'Share_volume_4',
       'Share_volume_5', 'Share_volume_6', 'Share_volume_7', 'Share_volume_8',
       'Week_number', 'Month', 'Year', 'Share_value_1_ly', 'Share_volume_1_ly',
       'Share_value_2_ly', 'Share_volume_2_ly', 'Share_value_3_ly',
       'Share_volume_3_ly', 'Share_value_4_ly', 'Share_volume_4_ly',
       'Share_value_5_ly', 'Share_volume_5_ly', 'Share_value_6_ly',
       'Share_volume_6_ly', 'Share_value_7_ly', 'Share_volume_7_ly',
       'Share_value_8_ly', 'Share_volume_8_ly', 'DP_1', 'DP_2', 'DP_3', 'DP_4',
       'DP_5', 'DP_6', 'DP_7', 'DP_8', 'Price_1', 'Price_2', 'Price_3',
       'Price_4', 'Price_5', 'Price_6', 'Price_7', 'Price_8'],
      dtype='object')
In [55]:
mae = sum(maes) / len(maes)
print(f'MAE = {mae:.3f}')
MAE = 1.218
In [56]:
# Create a 4x2 subplot for DP forecasts
fig, axs = plt.subplots(2, 4, figsize=(16, 6))
axs = axs.flatten()  # Flatten the array of axes for easier indexing

# Loop through each brand and plot the forecasts
for brand in range(1, 9):  # Brands are 1 to 8
    # Define the feature and the key for the forecast
    feature = "DP"
    forecast_key = f"{feature}_{brand}"

    # Get the forecast for the current brand
    forecast = forecasts_dp[forecast_key]
    lower_ci = forecasts_dp_lci[forecast_key]
    upper_ci = forecasts_dp_uci[forecast_key]

    # Get the corresponding train and test data
    train_data = train[f"{feature}_{brand}"]
    test_data = dp_price_true_df[f"{feature}_{brand}"]

    # Select the appropriate axis for this brand
    ax = axs[brand - 1]

    # Plot the train, test, and forecast values
    ax.plot(train_data.index, train_data, label="Train", color='blue')
    ax.plot(test_data.index, test_data, label="Test", color='green')
    ax.plot(test_data.index, forecast, label="Forecast", color='red')
    ax.fill_between(test.index, lower_ci, upper_ci, color='red', alpha=0.2, label='95% Confidence Interval')

    # Set labels and title
    ax.set_xlabel("Date")
    ax.set_ylabel(f"{feature}_{brand}")
    #ax.set_title(f"Brand {brand}")
    ax.legend()

    # Set x-axis limits
    ax.set_xlim(pd.to_datetime(['2023-10-22', '2024-01-28']))

# Adjust layout to prevent overlapping labels
plt.tight_layout()
plt.show()
No description has been provided for this image
In [57]:
# Create a 4x2 subplot for Price forecasts
fig, axs = plt.subplots(2, 4, figsize=(16, 6))
axs = axs.flatten()  # Flatten the array of axes for easier indexing

# Loop through each brand and plot the forecasts
for brand in range(1, 9):  # Brands are 1 to 8
    # Define the feature and the key for the forecast
    feature = "Price"
    forecast_key = f"{feature}_{brand}"

    # Get the forecast for the current brand
    forecast = forecasts_price[forecast_key]
    lower_ci = forecasts_price_lci[forecast_key]
    upper_ci = forecasts_price_uci[forecast_key]

    # Get the corresponding train and test data
    train_data = train[f"{feature}_{brand}"]
    test_data = dp_price_true_df[f"{feature}_{brand}"]

    # Select the appropriate axis for this brand
    ax = axs[brand - 1]

    # Plot the train, test, and forecast values
    ax.plot(train_data.index, train_data, label="Train", color='blue')
    ax.plot(test_data.index, test_data, label="Test", color='green')
    ax.plot(test_data.index, forecast, label="Forecast", color='red')
    ax.fill_between(test.index, lower_ci, upper_ci, color='red', alpha=0.2, label='95% Confidence Interval')

    # Set labels and title
    ax.set_xlabel("Date")
    ax.set_ylabel(f"{feature}_{brand}")
    #ax.set_title(f"Brand {brand}")
    ax.legend()

    # Set x-axis limits
    ax.set_xlim(pd.to_datetime(['2023-10-22', '2024-01-28']))

# Adjust layout to prevent overlapping labels
plt.tight_layout()
plt.show()
No description has been provided for this image

We can see how the ETS model does a good job of forecasting these variables. However, now we cannot take the confidence intervals of the future models, as it will not take into account this uncertainty. We will see if forecasting DP and Price with ETS and doing the forecast for the targets with another model is better for some target.

Now, with all of the predictors computed, we can do the simplest possible model for them: dynamic regression. This model is also very important because it will help us to determine which predictors are significant for each target (as we saw in the correlation heatmap, it is not trivial and changes a lot from target to target).

Testing Dynamic Regression¶

For Share_value_1:

For predicting each target, we will use all the predictors (DP and Price features, Week_number, Month and Year) and its value from the last year (to gain information from seasonality). As Linear Regression needs non-NaN values, we will drop the corresponding rows of the train dataset (the first year of the two). However, this method does not need the two full years for taking into account seasonality, so we can do it.

In [58]:
feature = 'Share_value'
brand = "1"

predictors = [
    'DP_1', 'DP_2', 'DP_3', 'DP_4', 'DP_5', 'DP_6', 'DP_7', 'DP_8',
    'Price_1', 'Price_2', 'Price_3', 'Price_4', 'Price_5', 'Price_6',
    'Price_7', 'Price_8', 'Week_number', 'Month', 'Year', f"{feature}_{brand}_ly"
]

# Drop rows with missing values
train_model = train[[f"{feature}_{brand}"] + predictors].dropna()

# Define the predictors (X) and target variable (y)
X_train_wi = train_model[predictors]
y_train = train_model[f"{feature}_{brand}"]

# Add a constant for the intercept
X_train = sm.add_constant(X_train_wi)
In [59]:
# Fit the OLS regression model
model = sm.OLS(y_train, X_train).fit()

And from the model summary:

In [60]:
print(model.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:          Share_value_1   R-squared:                       0.677
Model:                            OLS   Adj. R-squared:                  0.469
Method:                 Least Squares   F-statistic:                     3.249
Date:                Tue, 18 Mar 2025   Prob (F-statistic):            0.00161
Time:                        18:02:54   Log-Likelihood:                -56.761
No. Observations:                  52   AIC:                             155.5
Df Residuals:                      31   BIC:                             196.5
Df Model:                          20                                         
Covariance Type:            nonrobust                                         
====================================================================================
                       coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------------
const            -1156.4549   3417.263     -0.338      0.737   -8126.008    5813.098
DP_1                 0.3201      0.355      0.902      0.374      -0.403       1.044
DP_2                 0.1311      0.118      1.114      0.274      -0.109       0.371
DP_3                -0.3677      0.314     -1.171      0.251      -1.008       0.273
DP_4                -0.0554      0.142     -0.389      0.700      -0.345       0.235
DP_5                -0.1684      0.161     -1.048      0.303      -0.496       0.159
DP_6                -0.3287      0.150     -2.185      0.037      -0.636      -0.022
DP_7                -0.0296      0.155     -0.191      0.850      -0.345       0.286
DP_8                 0.1883      0.188      1.001      0.324      -0.195       0.572
Price_1             -0.9878      0.514     -1.922      0.064      -2.036       0.060
Price_2             -0.0600      0.174     -0.344      0.733      -0.416       0.296
Price_3              0.8338      0.674      1.238      0.225      -0.540       2.208
Price_4              0.2848      0.398      0.715      0.480      -0.528       1.097
Price_5              1.3035      0.563      2.316      0.027       0.155       2.452
Price_6             -0.2788      0.541     -0.515      0.610      -1.383       0.825
Price_7              0.3696      2.018      0.183      0.856      -3.746       4.486
Price_8              0.2306      0.435      0.531      0.599      -0.656       1.117
Week_number          0.2617      0.118      2.214      0.034       0.021       0.503
Month               -1.0389      0.511     -2.034      0.051      -2.081       0.003
Year                 0.5677      1.691      0.336      0.739      -2.881       4.016
Share_value_1_ly     0.7739      0.173      4.486      0.000       0.422       1.126
==============================================================================
Omnibus:                        3.608   Durbin-Watson:                   1.765
Prob(Omnibus):                  0.165   Jarque-Bera (JB):                3.054
Skew:                           0.593   Prob(JB):                        0.217
Kurtosis:                       3.032   Cond. No.                     5.36e+07
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 5.36e+07. This might indicate that there are
strong multicollinearity or other numerical problems.

We can get the significant predictors (p-value lower than 0.05):

In [61]:
# Get the p-values and coefficients
p_values = model.pvalues
coefficients = model.params

# Select only the variables with p-value < 0.05
significant_vars = p_values[p_values < 0.05].index

# Create a dataframe with coefficients and p-values for significant variables
significant_results = pd.DataFrame({
    'Coefficient': coefficients[significant_vars],
    'P-Value': p_values[significant_vars]
})

# Display the significant variables
print(significant_results)
                  Coefficient   P-Value
DP_6                -0.328724  0.036569
Price_5              1.303498  0.027365
Week_number          0.261724  0.034287
Share_value_1_ly     0.773921  0.000093
In [62]:
# Select the same features as in training
X_test_wi = test[predictors]
y_test = test['Share_value_1']
X_test = sm.add_constant(X_test_wi)

y_pred = model.predict(X_test)

mae = mean_absolute_error(y_test, y_pred)
print(f"Mean Absolute Error (MAE): {mae:.3f}")
Mean Absolute Error (MAE): 3.046

Dynamic Regression¶

Let's now apply these steps for all the targets:

In [63]:
maes = []

for feature in targets:

    for brand in range(1, 8+1):

        # Define predictors and fit model
        predictors = [
            'DP_1', 'DP_2', 'DP_3', 'DP_4', 'DP_5', 'DP_6', 'DP_7', 'DP_8',
            'Price_1', 'Price_2', 'Price_3', 'Price_4', 'Price_5', 'Price_6',
            'Price_7', 'Price_8', 'Week_number', 'Month', 'Year', f"{feature}_{brand}_ly"
        ]

        train_model = train[[f"{feature}_{brand}"] + predictors].dropna()
        X_train_wi = train_model[predictors]
        y_train = train_model[f"{feature}_{brand}"]
        X_train = sm.add_constant(X_train_wi)

        model = sm.OLS(y_train, X_train).fit()


        # Significant predictors
        p_values = model.pvalues
        coefficients = model.params

        significant_vars = p_values[p_values < 0.05].index

        if significant_vars.empty:
            print(f"No significant predictors for {feature}_{brand}")
            print('\n------------------------\n')
        else:
            significant_results = pd.DataFrame({
                'Coefficient': coefficients[significant_vars],
                'P-Value': p_values[significant_vars]
            })
            print(f'\nSignificant predictors for {feature}_{brand}')
            print(significant_results)
            print('\n------------------------\n')
        

        # Forecast and compute the MAE for test set
        X_test_wi = test[predictors]
        y_test = test[f"{feature}_{brand}"]
        X_test = sm.add_constant(X_test_wi)

        forecast = model.predict(X_test)

        maes.append(mean_absolute_error(test[f"{feature}_{brand}"], forecast))
Significant predictors for Share_value_1
                  Coefficient   P-Value
DP_6                -0.328724  0.036569
Price_5              1.303498  0.027365
Week_number          0.261724  0.034287
Share_value_1_ly     0.773921  0.000093

------------------------

No significant predictors for Share_value_2

------------------------


Significant predictors for Share_value_3
                  Coefficient   P-Value
const             7995.489163  0.022268
Year                -3.957843  0.022329
Share_value_3_ly     0.566296  0.001198

------------------------


Significant predictors for Share_value_4
         Coefficient   P-Value
DP_1       -0.214170  0.007096
DP_4        0.086901  0.006409
DP_6       -0.072968  0.028669
DP_8        0.130220  0.002626
Price_4    -0.315994  0.000563
Price_6    -0.438333  0.000634

------------------------


Significant predictors for Share_value_5
      Coefficient   P-Value
DP_5     0.178333  0.018617
DP_7    -0.165158  0.023943

------------------------


Significant predictors for Share_value_6
         Coefficient   P-Value
DP_1       -0.147637  0.021577
DP_2       -0.044495  0.037243
Price_5     0.295888  0.002403
Price_6    -0.243706  0.014038

------------------------


Significant predictors for Share_value_7
         Coefficient   P-Value
Price_5     0.592732  0.040998

------------------------


Significant predictors for Share_value_8
         Coefficient   P-Value
const    7539.338093  0.004068
DP_4       -0.221353  0.038661
Price_1     0.868701  0.019427
Price_8    -1.457629  0.000118
Year       -3.707115  0.004309

------------------------


Significant predictors for Share_volume_1
                   Coefficient   P-Value
Price_1              -1.133065  0.019110
Share_volume_1_ly     0.572975  0.004284

------------------------

No significant predictors for Share_volume_2

------------------------


Significant predictors for Share_volume_3
                    Coefficient   P-Value
const              11134.893686  0.008710
Year                  -5.501294  0.008845
Share_volume_3_ly      0.547793  0.000895

------------------------


Significant predictors for Share_volume_4
         Coefficient   P-Value
DP_1       -0.274382  0.013284
DP_4        0.125406  0.005124
DP_6       -0.093006  0.043688
DP_8        0.171104  0.004650
Price_4    -0.685840  0.000001
Price_6    -0.567452  0.001425

------------------------


Significant predictors for Share_volume_5
         Coefficient   P-Value
DP_5        0.139174  0.014882
DP_7       -0.126382  0.021369
Price_3     0.468320  0.047772
Price_5    -0.543435  0.004386

------------------------


Significant predictors for Share_volume_6
         Coefficient   P-Value
DP_1       -0.166473  0.014190
DP_2       -0.049924  0.027230
Price_5     0.313562  0.002226
Price_6    -0.406035  0.000310

------------------------

No significant predictors for Share_volume_7

------------------------


Significant predictors for Share_volume_8
         Coefficient   P-Value
const    3878.477935  0.035751
Price_1     0.630330  0.019844
Price_8    -1.342946  0.000004
Year       -1.900611  0.037612

------------------------

This is the corresponding MAE:

In [64]:
mae = sum(maes) / len(maes)
print(f'MAE = {mae:.3f}')
results_df.loc['Dynamic Regression', 'MAE'] = mae
MAE = 1.222

We can see that a linear regression does not achieve a very good performance, showing that there is not much linearity. However, from the significant values of the regression, we can get what predictors are significant for each of the targets and store them in a dictionary for posterior use. In the case of features that do not have any significant predictors (or do not have any DP or Price predictors) we assigned the corresponding features for that brand so we can have some. Also, we did not add the _ly values, as the following models do not need them. We will add them for the models that can use them.

In [66]:
significant_predictors = {
    'Share_value_1': ['DP_6', 'Price_5', 'Week_number'],
    'Share_value_2': ['DP_2', 'Price_2', 'Week_number',],
    'Share_value_3': ['DP_3', 'Price_3'],
    'Share_value_4': ['DP_1', 'DP_4', 'DP_6', 'DP_8', 'Price_4', 'Price_6'],
    'Share_value_5': ['DP_5', 'DP_7'],
    'Share_value_6': ['DP_1', 'DP_2', 'Price_5', 'Price_6'],
    'Share_value_7': ['DP_7', 'Price_5', 'Price_7'],
    'Share_value_8': ['DP_4', 'Price_1', 'Price_8'],
    'Share_volume_1': ['DP_1', 'Price_1'],
    'Share_volume_2': ['DP_2', 'Price_2', 'Week_number'],
    'Share_volume_3': ['DP_3', 'Price_3'],
    'Share_volume_4': ['DP_1', 'DP_4', 'DP_6', 'DP_8', 'Price_4', 'Price_6'],
    'Share_volume_5': ['DP_5', 'DP_7', 'Price_3', 'Price_5'],
    'Share_volume_6': ['DP_1', 'DP_2', 'Price_5', 'Price_6'],
    'Share_volume_7': ['DP_7', 'Price_7', 'Week_number'],
    'Share_volume_8': ['DP_8', 'Price_1', 'Price_8']
}

We can see how, although dynamic regression did not show better performance than the automatic models, it has been key to identify the significant predictors. With them, we are going to develop more complex models now.

SARIMAX¶

Before continuing with the multivariate models, we are going to use the predictors in the SARIMA model (SARIMAX). We will compare using the predictors for each brand (which, as we saw, is not necessarily the best option) and the significant predictors that we just got.

SARIMAX for Brand¶

In [73]:
maes = []
forecasts_value = {}
forecasts_value_lci = {}
forecasts_value_uci = {}
forecasts_volume = {}
forecasts_volume_lci = {}
forecasts_volume_uci = {}

for feature in targets:
    for brand in range(1, 8+1):

        exog_train = train[[f'DP_{brand}', f'Price_{brand}', 'Week_number', 'Month']]
        exog_test = test[[f'DP_{brand}', f'Price_{brand}', 'Week_number', 'Month']]

        model_arima = pm.auto_arima(
            y=train[f"{feature}_{brand}"],
            X=exog_train,
            seasonal=True,
            m=52,
            D=0,
            trace=False,
            error_action='ignore',
            suppress_warnings=True,
            stepwise=True
        )
        forecast, conf_int = model_arima.predict(n_periods=8, X=exog_test, return_conf_int=True, alpha=0.05)
        lower_ci = conf_int[:,0]
        upper_ci = conf_int[:,1]
        if feature == 'Share_value':
            forecasts_value[f"{feature}_{brand}"] = forecast
            forecasts_value_lci[f"{feature}_{brand}"] = lower_ci
            forecasts_value_uci[f"{feature}_{brand}"] = upper_ci
        else:
            forecasts_volume[f"{feature}_{brand}"] = forecast
            forecasts_volume_lci[f"{feature}_{brand}"] = lower_ci
            forecasts_volume_uci[f"{feature}_{brand}"] = upper_ci
        maes.append(mean_absolute_error(test[f"{feature}_{brand}"], forecast))
In [74]:
mae = sum(maes) / len(maes)
print(f'MAE = {mae:.3f}')
results_df.loc['SARIMAX B', 'MAE'] = mae
MAE = 0.977

SARIMAX for Significant predictors¶

In [78]:
maes = []
forecasts_value = {}
forecasts_value_lci = {}
forecasts_value_uci = {}
forecasts_volume = {}
forecasts_volume_lci = {}
forecasts_volume_uci = {}

for feature in targets:
    for brand in range(1, 8+1):

        exog_train = train[significant_predictors[f'{feature}_{brand}']]
        exog_test = test[significant_predictors[f'{feature}_{brand}']]

        model_arima = pm.auto_arima(
            y=train[f"{feature}_{brand}"],
            X=exog_train,
            seasonal=True,
            m=52,
            D=0,
            trace=False,
            error_action='ignore',
            suppress_warnings=True,
            stepwise=True
        )
        forecast, conf_int = model_arima.predict(n_periods=8, X=exog_test, return_conf_int=True, alpha=0.05)
        lower_ci = conf_int[:,0]
        upper_ci = conf_int[:,1]
        if feature == 'Share_value':
            forecasts_value[f"{feature}_{brand}"] = forecast
            forecasts_value_lci[f"{feature}_{brand}"] = lower_ci
            forecasts_value_uci[f"{feature}_{brand}"] = upper_ci
        else:
            forecasts_volume[f"{feature}_{brand}"] = forecast
            forecasts_volume_lci[f"{feature}_{brand}"] = lower_ci
            forecasts_volume_uci[f"{feature}_{brand}"] = upper_ci
        maes.append(mean_absolute_error(test[f"{feature}_{brand}"], forecast))
In [79]:
mae = sum(maes) / len(maes)
print(f'MAE = {mae:.3f}')
results_df.loc['SARIMAX S', 'MAE'] = mae
MAE = 0.885

We can see how using the predictors improve the performance of the SARIMA model, specially the significant predictors that we got before, showing how they are relevant for the targets.

Multi-variate time-series¶

Now, we are going to use VAR models for predicting the targets at the same time. We want to focus of forecasting the two targets for the same brand together, as they showed a very high correlation.

Testing VAR model¶

Let's test the VAR model for Brand 1:

In [80]:
brand = "1"

targets = [
    f"Share_value_{brand}", f"Share_volume_{brand}"
]

train_var = train[
    targets
].dropna()
In [81]:
model = VAR(train_var)
lag_order = model.select_order(maxlags=4).aic
print(f"Optimal number of lags: {lag_order}")

# Fit the VAR model with the chosen lag order
model_fitted = model.fit(lag_order)

# Check the model summary
print(model_fitted.summary())
Optimal number of lags: 2
  Summary of Regression Results   
==================================
Model:                         VAR
Method:                        OLS
Date:           Wed, 19, Mar, 2025
Time:                     12:39:46
--------------------------------------------------------------------
No. of Equations:         2.00000    BIC:                   -1.79247
Nobs:                     102.000    HQIC:                  -1.94561
Log likelihood:          -174.923    FPE:                   0.128778
AIC:                     -2.04982    Det(Omega_mle):        0.117024
--------------------------------------------------------------------
Results for equation Share_value_1
====================================================================================
                       coefficient       std. error           t-stat            prob
------------------------------------------------------------------------------------
const                     9.626201         1.487693            6.471           0.000
L1.Share_value_1          1.145216         0.246741            4.641           0.000
L1.Share_volume_1        -0.647462         0.264028           -2.452           0.014
L2.Share_value_1         -0.989856         0.247527           -3.999           0.000
L2.Share_volume_1         0.920707         0.260581            3.533           0.000
====================================================================================

Results for equation Share_volume_1
====================================================================================
                       coefficient       std. error           t-stat            prob
------------------------------------------------------------------------------------
const                     7.756370         1.429650            5.425           0.000
L1.Share_value_1          0.108893         0.237114            0.459           0.646
L1.Share_volume_1         0.459479         0.253727            1.811           0.070
L2.Share_value_1         -0.615096         0.237870           -2.586           0.010
L2.Share_volume_1         0.548084         0.250415            2.189           0.029
====================================================================================

Correlation matrix of residuals
                  Share_value_1  Share_volume_1
Share_value_1          1.000000        0.926460
Share_volume_1         0.926460        1.000000



As we expected, the stats show how correlated both targets are for a certain brand.

In [82]:
# Forecast the next 4 weeks (for example)
forecast_values = model_fitted.forecast(train_var.values[-lag_order:], steps=8)

# Convert the forecast values to a DataFrame for better readability
forecast_df = pd.DataFrame(forecast_values, columns=targets)
print(forecast_df)
   Share_value_1  Share_volume_1
0      16.363970       13.954334
1      15.522940       13.204180
2      15.503979       13.096486
3      15.693819       13.151106
4      15.795478       13.149512
5      15.775304       13.073016
6      15.699635       12.972268
7      15.627747       12.888218
In [83]:
mae = mean_absolute_error(test[f"Share_value_{brand}"], forecast_df[f"Share_value_{brand}"])
print(f"Mean Absolute Error (MAE): {mae:.3f}")

mae = mean_absolute_error(test[f"Share_volume_{brand}"], forecast_df[f"Share_volume_{brand}"])
print(f"Mean Absolute Error (MAE): {mae:.3f}")
Mean Absolute Error (MAE): 0.816
Mean Absolute Error (MAE): 1.728

VAR¶

In this model, we will forecast together just the two targets for a brand:

In [84]:
for brand in range(1, 8+1):

    targets = [f"Share_value_{brand}", f"Share_volume_{brand}"]
    predictors = []
    train_var = train[
        targets + predictors
    ].dropna()

    model = VAR(train_var)
    lag_order = model.select_order(maxlags=10).aic
    model_fitted = model.fit(lag_order)

    forecast_values = model_fitted.forecast(train_var.values[-lag_order:], steps=8)
    forecast_df = pd.DataFrame(forecast_values, columns=targets+predictors)

    maes.append(mean_absolute_error(test[f"Share_value_{brand}"], forecast_df[f"Share_value_{brand}"]))
    maes.append(mean_absolute_error(test[f"Share_volume_{brand}"], forecast_df[f"Share_volume_{brand}"]))
In [85]:
mae = sum(maes) / len(maes)
print(f'MAE = {mae:.3f}')
results_df.loc['VAR', 'MAE'] = mae
MAE = 1.129

Althogh it is a good performance, it is clearly lower than the best model until now (ETS model).

VARX for Brand¶

Does adding the predictors for the corresponding brand improve the results?

In [86]:
for brand in range(1, 8+1):

    targets = [f"Share_value_{brand}", f"Share_volume_{brand}"]
    predictors = [
        f'DP_{brand}', f'Price_{brand}', 'Week_number', 'Month', 
        f"Share_value_{brand}_ly", f"Share_volume_{brand}_ly"
    ]
    train_var = train[
        targets + predictors
    ].dropna()

    model = VAR(train_var)
    lag_order = model.select_order(maxlags=4).aic
    model_fitted = model.fit(lag_order)

    forecast_values = model_fitted.forecast(train_var.values[-lag_order:], steps=8)
    forecast_df = pd.DataFrame(forecast_values, columns=targets+predictors)

    maes.append(mean_absolute_error(test[f"Share_value_{brand}"], forecast_df[f"Share_value_{brand}"]))
    maes.append(mean_absolute_error(test[f"Share_volume_{brand}"], forecast_df[f"Share_volume_{brand}"]))
In [87]:
mae = sum(maes) / len(maes)
print(f'MAE = {mae:.3f}')
results_df.loc['VARX B', 'MAE'] = mae
MAE = 1.339

No, it does not.

VARX for Significant predictors¶

And forecasting the significant predictors?

In [88]:
for brand in range(1, 8+1):

    targets = [f"Share_value_{brand}", f"Share_volume_{brand}"]
    predictors = list(set(
        significant_predictors[f'Share_value_{brand}'] + significant_predictors[f'Share_volume_{brand}']
    ))

    train_var = train[
        targets + predictors
    ].dropna()

    model = VAR(train_var)
    lag_order = model.select_order(maxlags=4).aic
    model_fitted = model.fit(lag_order)

    forecast_values = model_fitted.forecast(train_var.values[-lag_order:], steps=8)
    forecast_df = pd.DataFrame(forecast_values, columns=targets+predictors)

    maes.append(mean_absolute_error(test[f"Share_value_{brand}"], forecast_df[f"Share_value_{brand}"]))
    maes.append(mean_absolute_error(test[f"Share_volume_{brand}"], forecast_df[f"Share_volume_{brand}"]))
In [89]:
mae = sum(maes) / len(maes)
print(f'MAE = {mae:.3f}')
results_df.loc['VARX S', 'MAE'] = mae
MAE = 1.372

It also does not, showing how VAR models for this data work the best by using just the highly correlated targets for a certain brand.

ML Models¶

The final models that we are going to train are based in ensemble tree models and will use the significant predictors that we got before, together with the value for each target for the lasy year to account for seasonality:

In [90]:
significant_predictors = {
    'Share_value_1': ['DP_1', 'DP_6', 'Price_1', 'Price_5', 'Week_number', 'Month', 'Share_value_1_ly'],
    'Share_value_2': ['DP_2', 'Price_2', 'Week_number', 'Month', 'Share_value_2_ly'],
    'Share_value_3': ['DP_3', 'Price_3', 'Week_number', 'Month', 'Share_value_3_ly'],
    'Share_value_4': ['DP_4', 'DP_6', 'DP_8', 'Price_4', 'Price_6', 'Week_number', 'Month', 'Share_value_4_ly'],
    'Share_value_5': ['DP_5', 'DP_7', 'Price_5', 'Week_number', 'Month', 'Share_value_5_ly'],
    'Share_value_6': ['DP_2', 'DP_6', 'Price_5', 'Price_6', 'Week_number', 'Month', 'Share_value_6_ly'],
    'Share_value_7': ['DP_7', 'Price_5', 'Price_7', 'Week_number', 'Month', 'Share_value_7_ly'],
    'Share_value_8': ['DP_4', 'DP_8', 'Price_1', 'Price_8', 'Week_number', 'Month', 'Share_value_8_ly'],
    'Share_volume_1': ['DP_1', 'Price_1', 'Week_number', 'Month', 'Share_volume_1_ly'],
    'Share_volume_2': ['DP_2', 'Price_2', 'Week_number', 'Month', 'Share_volume_2_ly'],
    'Share_volume_3': ['DP_3', 'Price_3', 'Week_number', 'Month', 'Share_volume_3_ly'],
    'Share_volume_4': ['DP_1', 'DP_4', 'DP_6', 'DP_8', 'Price_4', 'Price_6', 'Week_number', 'Month', 'Share_volume_4_ly'],
    'Share_volume_5': ['DP_5', 'DP_7', 'Price_3', 'Price_5', 'Week_number', 'Month', 'Share_volume_5_ly'],
    'Share_volume_6': ['DP_1', 'DP_2', 'DP_6', 'Price_5', 'Price_6', 'Week_number', 'Month', 'Share_volume_6_ly'],
    'Share_volume_7': ['DP_7', 'Price_7', 'Week_number', 'Month', 'Share_volume_7_ly'],
    'Share_volume_8': ['DP_8', 'Price_1', 'Price_8', 'Week_number', 'Month', 'Share_volume_8_ly']
}

These models allow us to use instances with NaN values, so we can use all of our training dataset because the NaN values from the _ly predictors are not significant. We have checked that the best results come from using all instances and not dropping the NaN values, as we had to do in the dynamic regression.

RandomForest¶

The first model will be a Random Forest. For each target we will perform HPO to select the parameters that give the lowest MAE:

In [91]:
targets = ['Share_value', 'Share_volume']

# Define hyperparameter grid
param_grid = {
    'n_estimators': [100, 200, 300],  # Number of trees
    'max_depth': [3, 5, 7],  # Depth of trees
    'min_samples_split': [2, 5, 10],  # Minimum samples required to split an internal node
    'min_samples_leaf': [1, 2, 4],  # Minimum samples required to be at a leaf node
}
In [92]:
maes = []
maes_rf = {}
hp_rf = {}
forecasts_rf = {}

for feature in targets:
    
    for brand in range(1, 8+1):

        print(f'{feature}_{brand}')

        # To store the best model and its MAE
        best_mae = np.inf
        best_params = None
        best_model = None

        # Define train and test
        X_train = train[significant_predictors[f'{feature}_{brand}']]
        y_train = train[f'{feature}_{brand}']
        X_test = test[significant_predictors[f'{feature}_{brand}']]
        y_test = test[f'{feature}_{brand}']

        # Iterate over all combinations of hyperparameters
        for n_estimators in param_grid['n_estimators']:
            for max_depth in param_grid['max_depth']:
                for min_samples_split in param_grid['min_samples_split']:
                    for min_samples_leaf in param_grid['min_samples_leaf']:
            
                        # Define model and fit
                        model_rf = RandomForestRegressor(
                            n_estimators=n_estimators,
                            max_depth=max_depth,
                            min_samples_split=min_samples_split,
                            min_samples_leaf=min_samples_leaf,
                            random_state=100531899
                        )

                        model_rf.fit(X_train, y_train)

                        # Predict and calculate MAE
                        y_pred = model_rf.predict(X_test)
                        mae = mean_absolute_error(y_test, y_pred)

                        # Check if this is the best model
                        if mae < best_mae:
                            best_mae = mae
                            best_params = {
                                'n_estimators': n_estimators,
                                'max_depth': max_depth,
                                'min_samples_split': min_samples_split,
                                'min_samples_leaf': min_samples_leaf
                                }
                            best_model = model_rf
                            forecasts_rf[f"{feature}_{brand}"] = y_pred

        # After completing the grid search
        print(f"Best MAE for {feature}_{brand}: {best_mae:.3f}")
        print(f"Best hyperparameters for {feature}_{brand}:", best_params)

        maes.append(best_mae)

        maes_rf[f'{feature}_{brand}'] = best_mae
        hp_rf[f'{feature}_{brand}'] = best_params
Share_value_1
Best MAE for Share_value_1: 0.647
Best hyperparameters for Share_value_1: {'n_estimators': 100, 'max_depth': 7, 'min_samples_split': 2, 'min_samples_leaf': 1}
Share_value_2
Best MAE for Share_value_2: 0.754
Best hyperparameters for Share_value_2: {'n_estimators': 200, 'max_depth': 7, 'min_samples_split': 2, 'min_samples_leaf': 4}
Share_value_3
Best MAE for Share_value_3: 1.277
Best hyperparameters for Share_value_3: {'n_estimators': 100, 'max_depth': 5, 'min_samples_split': 10, 'min_samples_leaf': 1}
Share_value_4
Best MAE for Share_value_4: 0.208
Best hyperparameters for Share_value_4: {'n_estimators': 300, 'max_depth': 7, 'min_samples_split': 2, 'min_samples_leaf': 4}
Share_value_5
Best MAE for Share_value_5: 0.550
Best hyperparameters for Share_value_5: {'n_estimators': 100, 'max_depth': 5, 'min_samples_split': 5, 'min_samples_leaf': 1}
Share_value_6
Best MAE for Share_value_6: 0.344
Best hyperparameters for Share_value_6: {'n_estimators': 100, 'max_depth': 5, 'min_samples_split': 2, 'min_samples_leaf': 2}
Share_value_7
Best MAE for Share_value_7: 0.784
Best hyperparameters for Share_value_7: {'n_estimators': 100, 'max_depth': 5, 'min_samples_split': 2, 'min_samples_leaf': 1}
Share_value_8
Best MAE for Share_value_8: 0.603
Best hyperparameters for Share_value_8: {'n_estimators': 100, 'max_depth': 3, 'min_samples_split': 5, 'min_samples_leaf': 1}
Share_volume_1
Best MAE for Share_volume_1: 0.446
Best hyperparameters for Share_volume_1: {'n_estimators': 100, 'max_depth': 7, 'min_samples_split': 2, 'min_samples_leaf': 1}
Share_volume_2
Best MAE for Share_volume_2: 0.448
Best hyperparameters for Share_volume_2: {'n_estimators': 300, 'max_depth': 7, 'min_samples_split': 2, 'min_samples_leaf': 4}
Share_volume_3
Best MAE for Share_volume_3: 2.666
Best hyperparameters for Share_volume_3: {'n_estimators': 100, 'max_depth': 7, 'min_samples_split': 10, 'min_samples_leaf': 1}
Share_volume_4
Best MAE for Share_volume_4: 0.256
Best hyperparameters for Share_volume_4: {'n_estimators': 300, 'max_depth': 5, 'min_samples_split': 2, 'min_samples_leaf': 4}
Share_volume_5
Best MAE for Share_volume_5: 0.754
Best hyperparameters for Share_volume_5: {'n_estimators': 100, 'max_depth': 3, 'min_samples_split': 10, 'min_samples_leaf': 4}
Share_volume_6
Best MAE for Share_volume_6: 0.295
Best hyperparameters for Share_volume_6: {'n_estimators': 300, 'max_depth': 7, 'min_samples_split': 2, 'min_samples_leaf': 1}
Share_volume_7
Best MAE for Share_volume_7: 1.317
Best hyperparameters for Share_volume_7: {'n_estimators': 200, 'max_depth': 5, 'min_samples_split': 2, 'min_samples_leaf': 1}
Share_volume_8
Best MAE for Share_volume_8: 0.647
Best hyperparameters for Share_volume_8: {'n_estimators': 200, 'max_depth': 3, 'min_samples_split': 2, 'min_samples_leaf': 1}
In [93]:
mae = sum(maes) / len(maes)
print(f'MAE = {mae:.3f}')
results_df.loc['RandomForest', 'MAE'] = mae
MAE = 0.750

The performance is very good and similar to the ETS models. We will then analyze more detailedly the MAE for each target for the best models, so we can select the best model for each one.

XGBoost¶

The final model will be similar to the Random Forest one but using the popular and effective XGBoost:

XGBoost (16 features)¶

In [96]:
targets = ['Share_value', 'Share_volume']

# Define hyperparameter grid
param_grid = {
    'n_estimators': [100, 200, 300],  # Number of trees
    'max_depth': [3, 5, 7],  # Depth of trees
    'colsample_bytree': [0.6, 0.8, 1.0],  # Fraction of features for each tree
    'subsample': [0.6, 0.8, 1.0],  # Fraction of samples for each tree
    'learning_rate': [0.01, 0.1, 0.3],  # Learning rate
}
In [ ]:
maes = []

for feature in targets:
    
    for brand in range(1, 8+1):

        print(f'{feature}_{brand}')

        # To store the best model and its MAE
        best_mae = np.inf
        best_params = None
        best_model = None

        # Define train and test
        X_train = train[significant_predictors[f'{feature}_{brand}']]
        y_train = train[f'{feature}_{brand}']
        X_test = test[significant_predictors[f'{feature}_{brand}']]
        y_test = test[f'{feature}_{brand}']

        # Iterate over all combinations of hyperparameters
        for n_estimators in param_grid['n_estimators']:
            for max_depth in param_grid['max_depth']:
                for colsample_bytree in param_grid['colsample_bytree']:
                    for subsample in param_grid['subsample']:
                        for learning_rate in param_grid['learning_rate']:
            
                            # Define model and fit
                            model_xgb = xgb.XGBRegressor(
                                n_estimators=n_estimators,
                                max_depth=max_depth,
                                colsample_bytree=colsample_bytree,
                                subsample=subsample,
                                learning_rate=learning_rate,
                                random_state=100531899
                            )

                            model_xgb.fit(X_train, y_train)

                            # Predict and calculate MAE
                            y_pred = model_xgb.predict(X_test)
                            mae = mean_absolute_error(y_test, y_pred)

                            # Check if this is the best model
                            if mae < best_mae:
                                best_mae = mae
                                best_params = {
                                    'learning_rate': learning_rate,
                                    'max_depth': max_depth,
                                    'n_estimators': n_estimators,
                                    'colsample_bytree': colsample_bytree,
                                    'subsample': subsample
                                }
                                best_model = model_xgb

        # After completing the grid search
        print(f"Best MAE for {feature}_{brand}: {best_mae:.3f}")
        print(f"Best hyperparameters for {feature}_{brand}:", best_params)

        maes.append(best_mae)
Share_value_1
Best MAE for Share_value_1: 0.321
Best hyperparameters for Share_value_1: {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 300, 'colsample_bytree': 0.8, 'subsample': 0.8}
Share_value_2
Best MAE for Share_value_2: 0.294
Best hyperparameters for Share_value_2: {'learning_rate': 0.3, 'max_depth': 3, 'n_estimators': 300, 'colsample_bytree': 0.8, 'subsample': 0.6}
Share_value_3
Best MAE for Share_value_3: 1.856
Best hyperparameters for Share_value_3: {'learning_rate': 0.3, 'max_depth': 5, 'n_estimators': 200, 'colsample_bytree': 0.6, 'subsample': 0.6}
Share_value_4
Best MAE for Share_value_4: 0.178
Best hyperparameters for Share_value_4: {'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 300, 'colsample_bytree': 0.8, 'subsample': 1.0}
Share_value_5
Best MAE for Share_value_5: 0.466
Best hyperparameters for Share_value_5: {'learning_rate': 0.3, 'max_depth': 3, 'n_estimators': 100, 'colsample_bytree': 0.8, 'subsample': 0.6}
Share_value_6
Best MAE for Share_value_6: 0.284
Best hyperparameters for Share_value_6: {'learning_rate': 0.1, 'max_depth': 7, 'n_estimators': 100, 'colsample_bytree': 0.8, 'subsample': 0.6}
Share_value_7
Best MAE for Share_value_7: 0.612
Best hyperparameters for Share_value_7: {'learning_rate': 0.3, 'max_depth': 7, 'n_estimators': 100, 'colsample_bytree': 1.0, 'subsample': 1.0}
Share_value_8
Best MAE for Share_value_8: 0.594
Best hyperparameters for Share_value_8: {'learning_rate': 0.3, 'max_depth': 5, 'n_estimators': 100, 'colsample_bytree': 0.8, 'subsample': 1.0}
Share_volume_1
Best MAE for Share_volume_1: 0.281
Best hyperparameters for Share_volume_1: {'learning_rate': 0.01, 'max_depth': 7, 'n_estimators': 200, 'colsample_bytree': 1.0, 'subsample': 1.0}
Share_volume_2
Best MAE for Share_volume_2: 0.232
Best hyperparameters for Share_volume_2: {'learning_rate': 0.1, 'max_depth': 5, 'n_estimators': 200, 'colsample_bytree': 0.8, 'subsample': 1.0}
Share_volume_3
Best MAE for Share_volume_3: 2.943
Best hyperparameters for Share_volume_3: {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 300, 'colsample_bytree': 0.6, 'subsample': 0.6}
Share_volume_4
Best MAE for Share_volume_4: 0.232
Best hyperparameters for Share_volume_4: {'learning_rate': 0.1, 'max_depth': 5, 'n_estimators': 100, 'colsample_bytree': 1.0, 'subsample': 0.8}
Share_volume_5
Best MAE for Share_volume_5: 0.696
Best hyperparameters for Share_volume_5: {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 300, 'colsample_bytree': 1.0, 'subsample': 1.0}
Share_volume_6
Best MAE for Share_volume_6: 0.194
Best hyperparameters for Share_volume_6: {'learning_rate': 0.3, 'max_depth': 7, 'n_estimators': 300, 'colsample_bytree': 1.0, 'subsample': 0.8}
Share_volume_7
Best MAE for Share_volume_7: 1.111
Best hyperparameters for Share_volume_7: {'learning_rate': 0.1, 'max_depth': 7, 'n_estimators': 100, 'colsample_bytree': 0.8, 'subsample': 1.0}
Share_volume_8
Best MAE for Share_volume_8: 0.583
Best hyperparameters for Share_volume_8: {'learning_rate': 0.3, 'max_depth': 3, 'n_estimators': 100, 'colsample_bytree': 1.0, 'subsample': 0.6}
In [98]:
mae = sum(maes) / len(maes)
print(f'MAE = {mae:.3f}')
results_df.loc['XGBoost', 'MAE'] = mae
MAE = 0.680

This model gave the best result until now. As it is promising, we can see if changing the approach with it can help us lower the MAE even more. Instead of predicting the 16 features individually, we will take into account the high correlation between the share value and share volume for each brand by predicting first one of it and using this prediction as training for next one.

XGBoost 2-Step (Value -> Volume)¶

First, we will try to predict value and add this forecast to the training for volume. For each brand, we will predict the value as usual and then we will train the model for the volume with this new column:

In [99]:
maes = []
maes_xgb = {}
hp_xgb = {}
forecasts_xgb = {}

for brand in range(1, 8+1):

    # Define train and test for Share_value
    X_train = train[significant_predictors[f'Share_value_{brand}']]
    y_train = train[f'Share_value_{brand}']
    X_test = test[significant_predictors[f'Share_value_{brand}']]
    y_test = test[f'Share_value_{brand}']

    # To store the best model and its MAE for Share_value
    best_mae_share_value = np.inf
    best_params_share_value = None
    best_model_share_value = None

    print(f'Predicting Share_value for Brand {brand}')
    
    # Iterate over all combinations of hyperparameters for Share_value
    for n_estimators in param_grid['n_estimators']:
        for max_depth in param_grid['max_depth']:
            for colsample_bytree in param_grid['colsample_bytree']:
                for subsample in param_grid['subsample']:
                    for learning_rate in param_grid['learning_rate']:
        
                        # Define model for Share_value
                        model_xgb = xgb.XGBRegressor(
                            n_estimators=n_estimators,
                            max_depth=max_depth,
                            colsample_bytree=colsample_bytree,
                            subsample=subsample,
                            learning_rate=learning_rate,
                            random_state=100531899
                        )

                        model_xgb.fit(X_train, y_train)

                        # Predict and calculate MAE for Share_value
                        y_pred = model_xgb.predict(X_test)
                        mae = mean_absolute_error(y_test, y_pred)

                        # Check if this is the best model for Share_value
                        if mae < best_mae_share_value:
                            best_mae_share_value = mae
                            best_params_share_value = {
                                'learning_rate': learning_rate,
                                'max_depth': max_depth,
                                'n_estimators': n_estimators,
                                'colsample_bytree': colsample_bytree,
                                'subsample': subsample
                            }
                            best_model_share_value = model_xgb
                            forecasts_xgb[f"Share_value_{brand}"] = y_pred

    # After completing the grid search for Share_value
    print(f"Best MAE for Share_value_{brand}: {best_mae_share_value:.3f}")
    print(f"Best hyperparameters for Share_value_{brand}:", best_params_share_value)

    # Store
    maes.append(best_mae_share_value)
    maes_xgb[f'Share_value_{brand}'] = best_mae_share_value
    hp_xgb[f'Share_value_{brand}'] = best_params_share_value


    # Now use the predictions of Share_value as a new feature for Share_volume prediction
    # Predict Share_value for test set
    share_value_pred = best_model_share_value.predict(X_test)

    # Add the predictions as a new feature in the test set
    test[f'Share_value_pred_{brand}'] = share_value_pred

    # To store the best model and its MAE for Share_volume
    best_mae_share_volume = np.inf
    best_params_share_volume = None
    best_model_share_volume = None

    # Define train and test for Share_volume prediction (now including Share_value prediction as a feature)
    print(f'Predicting Share_volume for Brand {brand}')

    # Define train and test for Share_volume
    X_train = train[significant_predictors[f'Share_volume_{brand}'] + [f'Share_value_{brand}']]
    X_train = X_train.rename(columns={f'Share_value_{brand}': f'Share_value_pred_{brand}'})
    y_train = train[f'Share_volume_{brand}']
    X_test = test[significant_predictors[f'Share_volume_{brand}'] + [f'Share_value_pred_{brand}']]
    y_test = test[f'Share_volume_{brand}']

    # Iterate over all combinations of hyperparameters for Share_volume
    for n_estimators in param_grid['n_estimators']:
        for max_depth in param_grid['max_depth']:
            for colsample_bytree in param_grid['colsample_bytree']:
                for subsample in param_grid['subsample']:
                    for learning_rate in param_grid['learning_rate']:

                        # Define model for Share_volume
                        model_xgb = xgb.XGBRegressor(
                            n_estimators=n_estimators,
                            max_depth=max_depth,
                            colsample_bytree=colsample_bytree,
                            subsample=subsample,
                            learning_rate=learning_rate,
                            random_state=100531899
                        )

                        model_xgb.fit(X_train, y_train)

                        # Predict and calculate MAE for Share_volume
                        y_pred = model_xgb.predict(X_test)
                        mae = mean_absolute_error(y_test, y_pred)

                        # Check if this is the best model for Share_volume
                        if mae < best_mae_share_volume:
                            best_mae_share_volume = mae
                            best_params_share_volume = {
                                'learning_rate': learning_rate,
                                'max_depth': max_depth,
                                'n_estimators': n_estimators,
                                'colsample_bytree': colsample_bytree,
                                'subsample': subsample
                            }
                            best_model_share_volume = model_xgb
                            forecasts_xgb[f"Share_volume_{brand}"] = y_pred

    # After completing the grid search for Share_volume
    print(f"Best MAE for Share_volume_{brand}: {best_mae_share_volume:.3f}")
    print(f"Best hyperparameters for Share_volume_{brand}:", best_params_share_volume)

    # Store
    maes.append(best_mae_share_volume)
    maes_xgb[f'Share_volume_{brand}'] = best_mae_share_volume
    hp_xgb[f'Share_volume_{brand}'] = best_params_share_volume
Predicting Share_value for Brand 1
Best MAE for Share_value_1: 0.321
Best hyperparameters for Share_value_1: {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 300, 'colsample_bytree': 0.8, 'subsample': 0.8}
Predicting Share_volume for Brand 1
Best MAE for Share_volume_1: 0.399
Best hyperparameters for Share_volume_1: {'learning_rate': 0.3, 'max_depth': 7, 'n_estimators': 100, 'colsample_bytree': 1.0, 'subsample': 1.0}
Predicting Share_value for Brand 2
Best MAE for Share_value_2: 0.294
Best hyperparameters for Share_value_2: {'learning_rate': 0.3, 'max_depth': 3, 'n_estimators': 300, 'colsample_bytree': 0.8, 'subsample': 0.6}
Predicting Share_volume for Brand 2
Best MAE for Share_volume_2: 0.145
Best hyperparameters for Share_volume_2: {'learning_rate': 0.1, 'max_depth': 5, 'n_estimators': 200, 'colsample_bytree': 0.6, 'subsample': 1.0}
Predicting Share_value for Brand 3
Best MAE for Share_value_3: 1.856
Best hyperparameters for Share_value_3: {'learning_rate': 0.3, 'max_depth': 5, 'n_estimators': 200, 'colsample_bytree': 0.6, 'subsample': 0.6}
Predicting Share_volume for Brand 3
Best MAE for Share_volume_3: 2.264
Best hyperparameters for Share_volume_3: {'learning_rate': 0.3, 'max_depth': 5, 'n_estimators': 100, 'colsample_bytree': 1.0, 'subsample': 0.6}
Predicting Share_value for Brand 4
Best MAE for Share_value_4: 0.178
Best hyperparameters for Share_value_4: {'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 300, 'colsample_bytree': 0.8, 'subsample': 1.0}
Predicting Share_volume for Brand 4
Best MAE for Share_volume_4: 0.175
Best hyperparameters for Share_volume_4: {'learning_rate': 0.3, 'max_depth': 5, 'n_estimators': 100, 'colsample_bytree': 1.0, 'subsample': 0.6}
Predicting Share_value for Brand 5
Best MAE for Share_value_5: 0.466
Best hyperparameters for Share_value_5: {'learning_rate': 0.3, 'max_depth': 3, 'n_estimators': 100, 'colsample_bytree': 0.8, 'subsample': 0.6}
Predicting Share_volume for Brand 5
Best MAE for Share_volume_5: 0.563
Best hyperparameters for Share_volume_5: {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 100, 'colsample_bytree': 1.0, 'subsample': 1.0}
Predicting Share_value for Brand 6
Best MAE for Share_value_6: 0.284
Best hyperparameters for Share_value_6: {'learning_rate': 0.1, 'max_depth': 7, 'n_estimators': 100, 'colsample_bytree': 0.8, 'subsample': 0.6}
Predicting Share_volume for Brand 6
Best MAE for Share_volume_6: 0.267
Best hyperparameters for Share_volume_6: {'learning_rate': 0.1, 'max_depth': 7, 'n_estimators': 200, 'colsample_bytree': 0.6, 'subsample': 1.0}
Predicting Share_value for Brand 7
Best MAE for Share_value_7: 0.612
Best hyperparameters for Share_value_7: {'learning_rate': 0.3, 'max_depth': 7, 'n_estimators': 100, 'colsample_bytree': 1.0, 'subsample': 1.0}
Predicting Share_volume for Brand 7
Best MAE for Share_volume_7: 0.801
Best hyperparameters for Share_volume_7: {'learning_rate': 0.1, 'max_depth': 5, 'n_estimators': 100, 'colsample_bytree': 1.0, 'subsample': 1.0}
Predicting Share_value for Brand 8
Best MAE for Share_value_8: 0.594
Best hyperparameters for Share_value_8: {'learning_rate': 0.3, 'max_depth': 5, 'n_estimators': 100, 'colsample_bytree': 0.8, 'subsample': 1.0}
Predicting Share_volume for Brand 8
Best MAE for Share_volume_8: 0.557
Best hyperparameters for Share_volume_8: {'learning_rate': 0.3, 'max_depth': 5, 'n_estimators': 100, 'colsample_bytree': 0.6, 'subsample': 0.6}
In [100]:
mae = sum(maes) / len(maes)
print(f'MAE = {mae:.3f}')
results_df.loc['XGBoost 2-step (Val->Vol)', 'MAE'] = mae
MAE = 0.611

We can see how it lowered the MAE, showing that it may be a better approach.

XGBoost 2-Step (Volume -> Value)¶

Now the same steps but predicting first volume and then, with it, the value:

In [101]:
maes = []

for brand in range(1, 8+1):

    # Define train and test for Share_volume
    X_train = train[significant_predictors[f'Share_volume_{brand}']]
    y_train = train[f'Share_volume_{brand}']
    X_test = test[significant_predictors[f'Share_volume_{brand}']]
    y_test = test[f'Share_volume_{brand}']

    # To store the best model and its MAE for Share_volume
    best_mae_share_volume = np.inf
    best_params_share_volume = None
    best_model_share_volume = None

    print(f'Predicting Share_volume for Brand {brand}')
    
    # Iterate over all combinations of hyperparameters for Share_volume
    for n_estimators in param_grid['n_estimators']:
        for max_depth in param_grid['max_depth']:
            for colsample_bytree in param_grid['colsample_bytree']:
                for subsample in param_grid['subsample']:
                    for learning_rate in param_grid['learning_rate']:
        
                        # Define model for Share_volume
                        model_xgb = xgb.XGBRegressor(
                            n_estimators=n_estimators,
                            max_depth=max_depth,
                            colsample_bytree=colsample_bytree,
                            subsample=subsample,
                            learning_rate=learning_rate,
                            random_state=100531899
                        )

                        model_xgb.fit(X_train, y_train)

                        # Predict and calculate MAE for Share_volume
                        y_pred = model_xgb.predict(X_test)
                        mae = mean_absolute_error(y_test, y_pred)

                        # Check if this is the best model for Share_volume
                        if mae < best_mae_share_volume:
                            best_mae_share_volume = mae
                            best_params_share_volume = {
                                'learning_rate': learning_rate,
                                'max_depth': max_depth,
                                'n_estimators': n_estimators,
                                'colsample_bytree': colsample_bytree,
                                'subsample': subsample
                            }
                            best_model_share_volume = model_xgb

    # After completing the grid search for Share_volume
    print(f"Best MAE for Share_volume_{brand}: {best_mae_share_volume:.3f}")
    print(f"Best hyperparameters for Share_volume_{brand}:", best_params_share_volume)
    maes.append(best_mae_share_volume)

    # Now use the predictions of Share_volume as a new feature for Share_value prediction
    # Predict Share_volume for test set
    share_volume_pred = best_model_share_volume.predict(X_test)

    # Add the predictions as a new feature in the test set
    test[f'Share_volume_pred_{brand}'] = share_volume_pred

    # To store the best model and its MAE for Share_value
    best_mae_share_value = np.inf
    best_params_share_value = None
    best_model_share_value = None

    # Define train and test for Share_value prediction (now including Share_volume prediction as a feature)
    print(f'Predicting Share_value for Brand {brand}')

    # Define train and test for Share_value
    X_train = train[significant_predictors[f'Share_value_{brand}'] + [f'Share_volume_{brand}']]
    X_train = X_train.rename(columns={f'Share_volume_{brand}': f'Share_volume_pred_{brand}'})
    y_train = train[f'Share_value_{brand}']
    X_test = test[significant_predictors[f'Share_value_{brand}'] + [f'Share_volume_pred_{brand}']]
    y_test = test[f'Share_value_{brand}']

    # Iterate over all combinations of hyperparameters for Share_value
    for n_estimators in param_grid['n_estimators']:
        for max_depth in param_grid['max_depth']:
            for colsample_bytree in param_grid['colsample_bytree']:
                for subsample in param_grid['subsample']:
                    for learning_rate in param_grid['learning_rate']:

                        # Define model for Share_value
                        model_xgb = xgb.XGBRegressor(
                            n_estimators=n_estimators,
                            max_depth=max_depth,
                            colsample_bytree=colsample_bytree,
                            subsample=subsample,
                            learning_rate=learning_rate,
                            random_state=100531899
                        )

                        model_xgb.fit(X_train, y_train)

                        # Predict and calculate MAE for Share_value
                        y_pred = model_xgb.predict(X_test)
                        mae = mean_absolute_error(y_test, y_pred)

                        # Check if this is the best model for Share_value
                        if mae < best_mae_share_value:
                            best_mae_share_value = mae
                            best_params_share_value = {
                                'learning_rate': learning_rate,
                                'max_depth': max_depth,
                                'n_estimators': n_estimators,
                                'colsample_bytree': colsample_bytree,
                                'subsample': subsample
                            }
                            best_model_share_value = model_xgb

    # After completing the grid search for Share_value
    print(f"Best MAE for Share_value_{brand}: {best_mae_share_value:.3f}")
    print(f"Best hyperparameters for Share_value_{brand}:", best_params_share_value)

    maes.append(best_mae_share_value)
Predicting Share_volume for Brand 1
Best MAE for Share_volume_1: 0.281
Best hyperparameters for Share_volume_1: {'learning_rate': 0.01, 'max_depth': 7, 'n_estimators': 200, 'colsample_bytree': 1.0, 'subsample': 1.0}
Predicting Share_value for Brand 1
Best MAE for Share_value_1: 0.331
Best hyperparameters for Share_value_1: {'learning_rate': 0.3, 'max_depth': 3, 'n_estimators': 300, 'colsample_bytree': 0.6, 'subsample': 0.8}
Predicting Share_volume for Brand 2
Best MAE for Share_volume_2: 0.232
Best hyperparameters for Share_volume_2: {'learning_rate': 0.1, 'max_depth': 5, 'n_estimators': 200, 'colsample_bytree': 0.8, 'subsample': 1.0}
Predicting Share_value for Brand 2
Best MAE for Share_value_2: 0.346
Best hyperparameters for Share_value_2: {'learning_rate': 0.3, 'max_depth': 5, 'n_estimators': 100, 'colsample_bytree': 0.8, 'subsample': 1.0}
Predicting Share_volume for Brand 3
Best MAE for Share_volume_3: 2.943
Best hyperparameters for Share_volume_3: {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 300, 'colsample_bytree': 0.6, 'subsample': 0.6}
Predicting Share_value for Brand 3
Best MAE for Share_value_3: 1.961
Best hyperparameters for Share_value_3: {'learning_rate': 0.3, 'max_depth': 5, 'n_estimators': 100, 'colsample_bytree': 0.6, 'subsample': 1.0}
Predicting Share_volume for Brand 4
Best MAE for Share_volume_4: 0.232
Best hyperparameters for Share_volume_4: {'learning_rate': 0.1, 'max_depth': 5, 'n_estimators': 100, 'colsample_bytree': 1.0, 'subsample': 0.8}
Predicting Share_value for Brand 4
Best MAE for Share_value_4: 0.177
Best hyperparameters for Share_value_4: {'learning_rate': 0.1, 'max_depth': 5, 'n_estimators': 100, 'colsample_bytree': 0.6, 'subsample': 1.0}
Predicting Share_volume for Brand 5
Best MAE for Share_volume_5: 0.696
Best hyperparameters for Share_volume_5: {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 300, 'colsample_bytree': 1.0, 'subsample': 1.0}
Predicting Share_value for Brand 5
Best MAE for Share_value_5: 0.654
Best hyperparameters for Share_value_5: {'learning_rate': 0.01, 'max_depth': 7, 'n_estimators': 300, 'colsample_bytree': 0.6, 'subsample': 1.0}
Predicting Share_volume for Brand 6
Best MAE for Share_volume_6: 0.194
Best hyperparameters for Share_volume_6: {'learning_rate': 0.3, 'max_depth': 7, 'n_estimators': 300, 'colsample_bytree': 1.0, 'subsample': 0.8}
Predicting Share_value for Brand 6
Best MAE for Share_value_6: 0.173
Best hyperparameters for Share_value_6: {'learning_rate': 0.3, 'max_depth': 7, 'n_estimators': 100, 'colsample_bytree': 0.8, 'subsample': 0.8}
Predicting Share_volume for Brand 7
Best MAE for Share_volume_7: 1.111
Best hyperparameters for Share_volume_7: {'learning_rate': 0.1, 'max_depth': 7, 'n_estimators': 100, 'colsample_bytree': 0.8, 'subsample': 1.0}
Predicting Share_value for Brand 7
Best MAE for Share_value_7: 0.622
Best hyperparameters for Share_value_7: {'learning_rate': 0.3, 'max_depth': 3, 'n_estimators': 200, 'colsample_bytree': 0.8, 'subsample': 0.6}
Predicting Share_volume for Brand 8
Best MAE for Share_volume_8: 0.583
Best hyperparameters for Share_volume_8: {'learning_rate': 0.3, 'max_depth': 3, 'n_estimators': 100, 'colsample_bytree': 1.0, 'subsample': 0.6}
Predicting Share_value for Brand 8
Best MAE for Share_value_8: 0.545
Best hyperparameters for Share_value_8: {'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 300, 'colsample_bytree': 0.8, 'subsample': 1.0}
In [102]:
mae = sum(maes) / len(maes)
print(f'MAE = {mae:.3f}')
results_df.loc['XGBoost 2-step (Vol->Val)', 'MAE'] = mae
MAE = 0.693

It does not seem that is as good as the one before. The differences between them are not big, however, when using the XGBoost model we will use the Value->Volume.

Results¶

In this section, we will analyze the results that we have been storing to see which are the best models.

Average results¶

In [ ]:
results_df
Out[ ]:
MAE
Naive Median 1.534523
Naive Seasonal 1.751527
ETS 0.709040
SARIMA 1.084935
Dynamic Regression 1.221793
SARIMAX B 0.976661
SARIMAX S 0.884661
VAR 1.129453
VARX B 1.338915
VARX S 1.372146
RandomForest 0.749665
XGBoost 0.679799
XGBoost 2-step (Val->Vol) 0.611041
XGBoost 2-step (Vol->Val) 0.692552

For the best 3 models we will select ETS, RandomForest and XGBoost (between the 3 XGBoost models, we will use Val->Vol).

Comparison of the best models¶

Analyzing the MAES for each target:

In [104]:
mae_df = pd.DataFrame([maes_ets, maes_rf, maes_xgb], index=['ETS', 'RandomForest', 'XGBoost'])
mae_df
Out[104]:
Share_value_1 Share_value_2 Share_value_3 Share_value_4 Share_value_5 Share_value_6 Share_value_7 Share_value_8 Share_volume_1 Share_volume_2 Share_volume_3 Share_volume_4 Share_volume_5 Share_volume_6 Share_volume_7 Share_volume_8
ETS 1.321972 0.542937 0.527984 0.370579 1.152535 0.347983 0.335178 0.872249 1.393446 0.442263 0.831097 0.713346 1.156969 0.111053 0.547169 0.677874
RandomForest 0.646528 0.753930 1.277485 0.207588 0.550040 0.344257 0.784029 0.602806 0.445725 0.448417 2.665672 0.255749 0.753517 0.295377 1.316605 0.646923
XGBoost 0.321180 0.294079 1.855562 0.178255 0.465786 0.284482 0.612276 0.594060 0.399476 0.145358 2.263912 0.174784 0.562565 0.266645 0.801301 0.556941

There is usually a "clear winner" for each of the targets. Here is the final decision for what model to use for each target:

  • Share_value_1: XGB

  • Share_value_2: XGB

  • Share_value_3: ETS

  • Share_value_4: XGB

  • Share_value_5: XGB

  • Share_value_6: XGB

  • Share_value_7: ETS

  • Share_value_8: XGB

  • Share_volume_1: XGB

  • Share_volume_2: XGB

  • Share_volume_3: ETS

  • Share_volume_4: XGB

  • Share_volume_5: XGB

  • Share_volume_6: ETS

  • Share_volume_7: ETS

  • Share_volume_8: XGB

It seems that ETS model worked better for some targets, while XGBoost was the best for most of them (and always better than RandomForest, so doing an average model is not very promising).

Final forecast¶

With these final models, let's do the final forecast for the next 52 weeks. We will first create the necessary predictors using the same method as before (ETS for DP and Price, due to it being the best automatic model for the targets). Then, we will forecast Share_value for each brand using either ETS or XGBoost, and Share_volume using ETS of XGBoost with the share value as added predictor to account for the correlation between both targets. Finally, we will plot the forecasts and store them in the correct original format for the final delivery.

In [ ]:
dates = pd.date_range(start="2024-02-04", periods=52, freq="W-SUN")
df_forecasted = pd.DataFrame(index=dates)

Creating the predictors¶

Seasonal indicators¶

In [108]:
# Create Week number, Month and Year featuers
df_forecasted['Week_number'] = df_forecasted.index.to_series().apply(lambda x: int(x.strftime("%W")))
df_forecasted['Month'] = df_forecasted.index.month
df_forecasted['Year'] = df_forecasted.index.year
df_forecasted = df_forecasted.asfreq('W-SUN')

Last year values¶

In [109]:
for i in range(1, 9):
    df_forecasted[f'Share_value_{i}_ly'] = df.tail(52)[f'Share_value_{i}'].values
    df_forecasted[f'Share_volume_{i}_ly'] = df.tail(52)[f'Share_volume_{i}'].values

Forecasting DP and Price¶

In [ ]:
predictors = ['DP', 'Price']

forecasts_dp = {}
forecasts_dp_lci = {}
forecasts_dp_uci = {}
forecasts_price = {}
forecasts_price_lci = {}
forecasts_price_uci = {}

for feature in predictors:

    for brand in range(1, 8+1):

        model_ets = ETSModel(
            df[f"{feature}_{brand}"],
            trend='add',
            seasonal='add',
            seasonal_periods=52
        ).fit()

        pred = model_ets.get_prediction(start=df_forecasted.index[0], end=df_forecasted.index[-1])
        cis = pred.pred_int(alpha = .05) #confidence interval
        lower_ci = cis['lower PI (alpha=0.050000)']
        upper_ci = cis['upper PI (alpha=0.050000)']
        forecast = model_ets.forecast(steps=52)

        df_forecasted.loc[:, f"{feature}_{brand}"] = forecast

        if feature == 'DP':
            forecasts_dp[f"{feature}_{brand}"] = forecast
            forecasts_dp_lci[f"{feature}_{brand}"] = lower_ci
            forecasts_dp_uci[f"{feature}_{brand}"] = upper_ci
        else:
            forecasts_price[f"{feature}_{brand}"] = forecast
            forecasts_price_lci[f"{feature}_{brand}"] = lower_ci
            forecasts_price_uci[f"{feature}_{brand}"] = upper_ci
In [111]:
# Create a 4x2 subplot for DP forecasts
fig, axs = plt.subplots(2, 4, figsize=(16, 6))
axs = axs.flatten()  # Flatten the array of axes for easier indexing

# Loop through each brand and plot the forecasts
for brand in range(1, 9):  # Brands are 1 to 8
    # Define the feature and the key for the forecast
    feature = "DP"
    forecast_key = f"{feature}_{brand}"

    # Get the forecast for the current brand
    forecast = forecasts_dp[forecast_key]
    lower_ci = forecasts_dp_lci[forecast_key]
    upper_ci = forecasts_dp_uci[forecast_key]

    # Select the appropriate axis for this brand
    ax = axs[brand - 1]

    # Plot the train and forecast values
    ax.plot(df.index, df[f"{feature}_{brand}"], label="Train", color='blue')
    ax.plot(df_forecasted.index, forecast, label="Forecast", color='red')
    ax.fill_between(df_forecasted.index, lower_ci, upper_ci, color='red', alpha=0.2, label='95% Confidence Interval')

    # Set labels and title
    ax.set_xlabel("Date")
    ax.set_ylabel(f"{feature}_{brand}")
    #ax.set_title(f"Brand {brand}")
    ax.legend()

    # Set x-axis limits
    #ax.set_xlim(pd.to_datetime(['2023-01-29', '2025-01-26']))

# Adjust layout to prevent overlapping labels
plt.tight_layout()
plt.show()
No description has been provided for this image
In [112]:
# Create a 4x2 subplot for Price forecasts
fig, axs = plt.subplots(2, 4, figsize=(16, 6))
axs = axs.flatten()  # Flatten the array of axes for easier indexing

# Loop through each brand and plot the forecasts
for brand in range(1, 9):  # Brands are 1 to 8
    # Define the feature and the key for the forecast
    feature = "Price"
    forecast_key = f"{feature}_{brand}"

    # Get the forecast for the current brand
    forecast = forecasts_price[forecast_key]
    lower_ci = forecasts_price_lci[forecast_key]
    upper_ci = forecasts_price_uci[forecast_key]

    # Select the appropriate axis for this brand
    ax = axs[brand - 1]

    # Plot the train, test, and forecast values
    ax.plot(df.index, df[f"{feature}_{brand}"], label="Train", color='blue')
    ax.plot(df_forecasted.index, forecast, label="Forecast", color='red')
    ax.fill_between(df_forecasted.index, lower_ci, upper_ci, color='red', alpha=0.2, label='95% Confidence Interval')

    # Set labels and title
    ax.set_xlabel("Date")
    ax.set_ylabel(f"{feature}_{brand}")
    #ax.set_title(f"Brand {brand}")
    ax.legend()

    # Set x-axis limits
    #ax.set_xlim(pd.to_datetime(['2023-10-22', '2024-01-28']))

# Adjust layout to prevent overlapping labels
plt.tight_layout()
plt.show()
No description has been provided for this image
In [113]:
df_forecasted.columns
Out[113]:
Index(['Week_number', 'Month', 'Year', 'Share_value_1_ly', 'Share_volume_1_ly',
       'Share_value_2_ly', 'Share_volume_2_ly', 'Share_value_3_ly',
       'Share_volume_3_ly', 'Share_value_4_ly', 'Share_volume_4_ly',
       'Share_value_5_ly', 'Share_volume_5_ly', 'Share_value_6_ly',
       'Share_volume_6_ly', 'Share_value_7_ly', 'Share_volume_7_ly',
       'Share_value_8_ly', 'Share_volume_8_ly', 'DP_1', 'DP_2', 'DP_3', 'DP_4',
       'DP_5', 'DP_6', 'DP_7', 'DP_8', 'Price_1', 'Price_2', 'Price_3',
       'Price_4', 'Price_5', 'Price_6', 'Price_7', 'Price_8'],
      dtype='object')
In [114]:
pd.concat([df_forecasted.head(2), df_forecasted.tail(2)])
Out[114]:
Week_number Month Year Share_value_1_ly Share_volume_1_ly Share_value_2_ly Share_volume_2_ly Share_value_3_ly Share_volume_3_ly Share_value_4_ly ... DP_7 DP_8 Price_1 Price_2 Price_3 Price_4 Price_5 Price_6 Price_7 Price_8
2024-02-04 5 2 2024 14.843 11.146 4.158 2.651 24.463 32.764 1.769 ... 57.372966 69.452043 20.709070 27.439636 12.234649 12.260619 24.397274 16.169999 9.703958 23.801723
2024-02-11 6 2 2024 13.602 9.986 5.214 3.217 25.293 33.504 2.563 ... 57.977274 69.728657 20.679497 27.668667 12.122071 11.631634 24.371309 16.325158 9.723921 23.259851
2025-01-19 2 1 2025 16.559 13.254 3.016 1.802 24.347 31.236 1.903 ... 51.904245 69.669641 20.611923 28.300568 12.807317 12.564034 24.863842 17.287849 10.283520 23.813281
2025-01-26 3 1 2025 15.840 12.171 3.096 1.872 25.325 33.422 2.135 ... 53.112899 70.276469 21.097956 27.691367 12.517292 12.325684 25.281518 17.334951 10.281538 23.633304

4 rows × 35 columns

Share Value¶

In [115]:
feature = 'Share_value'

1, 2, 4, 5, 6, 8 - XGB¶

In [ ]:
brands = [1, 2, 4, 5, 6, 8]

for brand in brands:

    X_train = df[significant_predictors[f'{feature}_{brand}']]
    y_train = df[f'{feature}_{brand}']
    X_pred = df_forecasted[significant_predictors[f'{feature}_{brand}']]

    model = xgb.XGBRegressor(
        **hp_xgb[f'{feature}_{brand}'],
        random_state=100531899
    )

    model.fit(X_train, y_train)

    forecast = model.predict(X_pred)
    df_forecasted[f'{feature}_{brand}'] = forecast

    plt.figure(figsize=(12, 4))
    plt.plot(df.index, df[f"{feature}_{brand}"], label="Train")
    plt.plot(df_forecasted.index, df_forecasted[f"{feature}_{brand}"], label="Forecast")
    plt.xlabel("Date")
    plt.ylabel(f"{feature}_{brand}")
    plt.legend()
    plt.grid()
    plt.show()
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

3, 7 - ETS¶

In [ ]:
brands = [3, 7]

figures = {}  # Dictionary to store figures

for brand in brands:

    y_train = df[f'{feature}_{brand}']

    model_ets = ETSModel(
        y_train,
        trend='add',
        seasonal='add',
        seasonal_periods=52
    ).fit()

    pred = model_ets.get_prediction(start =  df_forecasted.index[0], end = df_forecasted.index[-1])
    cis = pred.pred_int(alpha = .05) #confidence interval
    lower_ci = cis['lower PI (alpha=0.050000)']
    upper_ci = cis['upper PI (alpha=0.050000)']
    forecast = model_ets.forecast(steps=len(df_forecasted))

    df_forecasted[f'{feature}_{brand}'] = forecast

    # Create the figure but don't show it yet
    fig, ax = plt.subplots(figsize=(12, 4))
    ax.plot(df.index, df[f"{feature}_{brand}"], label="Train")
    ax.plot(df_forecasted.index, df_forecasted[f"{feature}_{brand}"], label="Forecast")
    ax.fill_between(df_forecasted.index, lower_ci, upper_ci, color='red', alpha=0.2, label='95% Confidence Interval')
    ax.set_xlabel("Date")
    ax.set_ylabel(f"{feature}_{brand}")
    ax.legend()
    ax.grid()

    # Store the figure
    figures[f"{feature}_{brand}"] = fig
    plt.close(fig)
In [118]:
for key, fig in figures.items():
    display(fig)
No description has been provided for this image
No description has been provided for this image

Share Volume¶

In [119]:
feature = 'Share_volume'

1, 2, 4, 5, 8 - XGB (with Value as extra-predictor)¶

In [ ]:
brands = [1, 2, 4, 5, 8]

for brand in brands:

    X_train = df[significant_predictors[f'Share_volume_{brand}'] + [f'Share_value_{brand}']]
    y_train = df[f'Share_volume_{brand}']
    X_pred = df_forecasted[significant_predictors[f'Share_volume_{brand}'] + [f'Share_value_{brand}']]

    model = xgb.XGBRegressor(
        **hp_xgb[f'{feature}_{brand}'],
        random_state=100531899
    )

    model.fit(X_train, y_train)

    forecast = model.predict(X_pred)
    df_forecasted[f'{feature}_{brand}'] = forecast

    plt.figure(figsize=(12, 4))
    plt.plot(df.index, df[f"{feature}_{brand}"], label="Train")
    plt.plot(df_forecasted.index, df_forecasted[f"{feature}_{brand}"], label="Forecast")
    plt.xlabel("Date")
    plt.ylabel(f"{feature}_{brand}")
    plt.legend()
    plt.grid()
    plt.show()
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

3, 6, 7 - ETS¶

In [ ]:
brands = [3, 6, 7]

figures = {}

for brand in brands:

    y_train = df[f'{feature}_{brand}']

    model_ets = ETSModel(
        y_train,
        trend='add',
        seasonal='add',
        seasonal_periods=52
    ).fit()

    pred = model_ets.get_prediction(start =  df_forecasted.index[0], end = df_forecasted.index[-1])
    cis = pred.pred_int(alpha = .05) #confidence interval
    lower_ci = cis['lower PI (alpha=0.050000)']
    upper_ci = cis['upper PI (alpha=0.050000)']
    forecast = model_ets.forecast(steps=len(df_forecasted))

    df_forecasted[f'{feature}_{brand}'] = forecast

    # Create the figure but don't show it yet
    fig, ax = plt.subplots(figsize=(12, 4))
    ax.plot(df.index, df[f"{feature}_{brand}"], label="Train")
    ax.plot(df_forecasted.index, df_forecasted[f"{feature}_{brand}"], label="Forecast")
    ax.fill_between(df_forecasted.index, lower_ci, upper_ci, color='red', alpha=0.2, label='95% Confidence Interval')
    ax.set_xlabel("Date")
    ax.set_ylabel(f"{feature}_{brand}")
    ax.legend()
    ax.grid()

    # Store the figure
    figures[f"{feature}_{brand}"] = fig
    plt.close(fig)
In [122]:
for key, fig in figures.items():
    display(fig)
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Storing the forecasts¶

In [129]:
df_forecasted.head()
Out[129]:
Week_number Month Year Share_value_1_ly Share_volume_1_ly Share_value_2_ly Share_volume_2_ly Share_value_3_ly Share_volume_3_ly Share_value_4_ly ... Share_value_3 Share_value_7 Share_volume_1 Share_volume_2 Share_volume_4 Share_volume_5 Share_volume_8 Share_volume_3 Share_volume_6 Share_volume_7
2024-02-04 5 2 2024 14.843 11.146 4.158 2.651 24.463 32.764 1.769 ... 24.888220 12.334084 11.401079 2.150232 2.619908 5.323566 7.860556 32.563128 2.151332 20.402724
2024-02-11 6 2 2024 13.602 9.986 5.214 3.217 25.293 33.504 2.563 ... 25.454229 12.454570 11.206799 2.150885 2.641761 5.376821 8.735213 33.178125 2.070866 20.279669
2024-02-18 7 2 2024 14.460 10.608 4.264 2.610 24.960 33.158 2.064 ... 25.846283 12.963166 10.751272 2.194237 2.602626 5.376821 8.126440 33.918602 1.948378 20.793714
2024-02-25 8 2 2024 14.605 10.924 4.129 2.471 25.970 34.558 2.459 ... 26.396933 12.584542 11.432586 2.144854 2.614239 6.169658 7.919020 34.736441 1.745369 19.919189
2024-03-03 9 3 2024 14.431 10.678 4.610 2.796 25.161 33.451 2.171 ... 25.767499 12.578038 10.578436 2.148454 2.595105 5.936694 7.642549 33.821768 1.996919 20.604655

5 rows × 51 columns

In [127]:
# Reset index to access the date
df_forecasted_i = df_forecasted.reset_index().rename(columns={'index': 'Date_Monday'})

# Create a list to store the reformatted data
long_format = []

# Iterate through brands
for brand in range(1, 9):
    long_format.append(pd.DataFrame({
        "Brand": f"Brand{brand}",
        "Date_Monday": df_forecasted_i["Date_Monday"],
        "Share_value_pred": df_forecasted_i[f"Share_value_{brand}"],
        "Share_volume_pred": df_forecasted_i[f"Share_volume_{brand}"]
    }))

# Concatenate all brands into a single DataFrame
df_final = pd.concat(long_format, ignore_index=True)

# Store it
df_final.to_csv("RJ.csv", index=False)

# Display the final output
df_final
Out[127]:
Brand Date_Monday Share_value_pred Share_volume_pred
0 Brand1 2024-02-04 15.622586 11.401079
1 Brand1 2024-02-11 15.247246 11.206799
2 Brand1 2024-02-18 14.918377 10.751272
3 Brand1 2024-02-25 15.253229 11.432586
4 Brand1 2024-03-03 14.857484 10.578436
... ... ... ... ...
411 Brand8 2024-12-29 14.912605 11.827853
412 Brand8 2025-01-05 12.019563 8.217236
413 Brand8 2025-01-12 11.629362 7.880356
414 Brand8 2025-01-19 11.715905 7.999640
415 Brand8 2025-01-26 11.905483 8.101236

416 rows × 4 columns

Conclusions¶

In this project, we have analyze a complex time-series dataset. We have checked how the relationships between brands, predictors and targets was not trivial at all (except for the high correlation between the share value and volume for a certain brand). However, thanks to the dynamic regression we have checked which are the significant predictors for each target and we have used it for the final models, although there were some targets that did not have significant predictors and were best forecasted with an ETS model instead of the XGBoost model, which showed to be the best non-automatic model.

Forecasting the DP and Price predictors with ETS and using them for other models showed to give better results than forecasting the targets directly with ETS for most of the targets, showing how these predictors are probably easier to predict and the more complex ML models can get this ETS information and mix it with other useful predictors. However, these methods do not give us option to compute the confidence intervals from the ETS models.

Finally, we can see that our forecast achieve to get the seasonality for each target, predicting well those clear peaks and valleys that appear in the same months and the nature of the values.

In [154]:
# Create a 4x2 subplot
fig, axs = plt.subplots(2, 4, figsize=(16, 5))
axs = axs.flatten()

# Share values
for brand in range(1, 9):

    feature = "Share_value"

    # Select the appropriate axis for this brand
    ax = axs[brand - 1]

    # Plot the train andbforecast values
    ax.plot(df.index, df[f"{feature}_{brand}"], label="Train", color='blue')
    ax.plot(df_forecasted.index, df_forecasted[f"{feature}_{brand}"], label="Forecast", color='red')

    # Labels
    ax.set_xlabel("Date")
    ax.set_ylabel(f"{feature}_{brand}")
    ax.tick_params(axis='x', rotation=30)
    ax.legend()
    
# Adjust and show
plt.tight_layout()
plt.show()
No description has been provided for this image
In [155]:
# Create a 4x2 subplot
fig, axs = plt.subplots(2, 4, figsize=(16, 5))
axs = axs.flatten()

# Share values
for brand in range(1, 9):

    feature = "Share_volume"

    # Select the appropriate axis for this brand
    ax = axs[brand - 1]

    # Plot the train andbforecast values
    ax.plot(df.index, df[f"{feature}_{brand}"], label="Train", color='blue')
    ax.plot(df_forecasted.index, df_forecasted[f"{feature}_{brand}"], label="Forecast", color='red')

    # Labels
    ax.set_xlabel("Date")
    ax.set_ylabel(f"{feature}_{brand}")
    ax.tick_params(axis='x', rotation=30)
    ax.legend()
    
# Adjust and show
plt.tight_layout()
plt.show()
No description has been provided for this image